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AN EXPERIMENTAL AND STATISTICAL STUDY OF 
READING AND READING TESTS.'* 


ARTHUR I. GATES 
Teachers’ College, Columbia University. 


A recent bibliography’ contains titles of 18 tests for silent read- 
ing, 3 for oral reading and 6 for word knowledge (vocabulary). This 
list bears witness to a keen interest and productive work in the 
measurement of reading ability, and it is encouraging that in the 
opinions of many, some of the latest tests are better instruments 
than most of the earlier ones. While it is doubtless desirable that 
the abilities of some should be devoted to the creation of new and 
better tests, it is imperative that extensive experimental and sta- 
tistical studies be made of the many tests now available, if we are 
to make rapid progress in the improvement of testing materials. 


Mrs. May Ayres Burgess has set an admirable example by accom- 
panying her recent test with a monograph*® in which she has stated 
carefully the principles upon which her work is based together with 
an account of the construction of the scale, grade norms, measures 
of reliability and other necessary information. Too frequently 
scales are published before their usefulness is known. If the author 
of a test does not empirically discover its merits and defects, most 


This study was made possible by the generosity of Mr. Frank A. Vanderlip and the 
interest and co-operation of Mr. Wilford M. Aiken, Founder and Director, respectively, 
of the Scarborough School at Scarborough, N. Y. At the beginning of the Academic 
year 1920-21, a Department of Educationa IResearch was organized in the School, under 
the direction of the writer, and during the year he has enjoyed the able co-operation 
of Miss Jessie DeSalle, who was primarily responsible for the testing, and Miss. Ella 
Woodyard, primarily responsible for the statistical work. The co-operation of the teach- 
ing staff has been excellent. 


2Bibliography of Tests for Use in Schools. The World Book Co., Yonkers, N. Y. 1921. 


3May Ayres Burgess. The Measurement of Silent Reading. Russell Sage Foundation. 
New York. 1921. Pp. 163. 
303 











‘ 


Wk..." peels eres. \ 


At sO ee oe! 


a 


Se ps 
- 


a P «> 
ta 


, 
Z 

P + 
ia 








» 
a 
ay 

i 

{ 
: 


Fs 


’ 2 
SS a 
iy 
Me aed ‘tes >. 
PRE he ENT: 


ne 
~ 


re = poy: aos 
Pa Se ee 


a Ac ae oa, 


an 


ae 


we 


ed Fanaa ae 


> gerner 
Rees * 
of 


+ mnary ee 


% 
» 
4 
- 
. 





me oe 


bated Reve yee 


- Sheree = 


Kol oF-% 
i Arde =o 


a. - 


Bile Kgs 
eee A nin ng 


POT A Se SE ROP 
at) Gate ete Ses - 
eas Sees = 


eens 


Jae 


304 The Journal of Educational Psychology 


frequently it is not done at all. The fact is that at the present time 
we have practically no information concerning most of the tests, 
outside of tables of norms and possibly a few measures of con- 
sistency as determined by retest. We do not know whether they 
measure rate of reading, comprehension, both or neither. We do 
not know how speed is related to comprehension, as a consequence. 
We do not know whether different tests of “comprehension,” for 
¢xample, measure the same or very different functions. We do not 
agree as to what should constitute a criterion of reading ability. 
in fact, two writers* have recently questioned the very possibility 
of testing “general reading ability” by a single scale. “The usual 
silent reading scale may be considered to measure—not silent read- 
ing ability in general, since there seems to be little evidence of any 
general factor of outstanding importance.” P. 29. While the evi- 
dence presented by these writers in support of their opinion is very 
meagre (inter-correlations of 4 short tests given to one grade), it 
betrays a startling dearth of evidence to the contrary. 


There is pretty fair agreement that reading ability depends upoi 
at least two elements—speed and comprehension—but just what 
shall constitute a measure of either is a matter of dispute. What 
competent workers think constitutes comprehension, for example, 
may be discovered by examination of existing tests, of which repre- 
sentatives are listed on a later page. Which of these, if any, best 
represents general ability to comprehend in reading, we do not know. 
Some hold that those who utilize an unassisted reproduction, e. g. 
Brown, are not testing comprehension in reading, but memory, 
ability to write English compositions and other abilities. Some 
hold that the scales which present brief paragraphs followed by 
questions, or directions to cross out a word, etc., measure the ability 
to reason, to infer, to solve puzzles, to resist suggestion, to attend 
closely, to discriminate between words, etc., none of which can be 
called precisely ability to read, although it is possible that all these 
may be involved in it. 3 


In the interpretation of our results, the assumption has been made 
that general reading ability is not a fiction, but probably a very 
broad function. Like general intelligence, it is a reality, but not a 
single capacity, function or power. It is precisely a cross-section, 


‘L, W. Pressey and S. L. Pressey. A Critical Study of the Concept of Silent Reading. 
Journal of Educational Psychology. 1921. 12, 25-32. 
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average or composite of many functions. What functions are to be 
included in the composite are in the first instance determined by 
competent judges. Experimental work results in eliminations and 
additions. In the case of reading, the assumption has been made 
that a composite score made up of a number of representative tests, 
carefully given, does represent general reading ability, and any 
test no matter what it may appear to be, is a test of reading ability 
if it yields a satisfactory correlation with this criterion, It is, of 
course, to be understood that, after experimental work has been 
done, a much more adequate criterion can doubtless be constructed. 


For the purpose of evaluating the usefulness of the several instru- 
ments other criteria are to be employed. They are, in the main: 


1. Reliability or consistency. Do the subjects perform identi- 
cally on each of several occasions? 

2. Objectivity. Will different experimenters secure the same 
results? 

3. Are the tests units properly equalized or defined ? 

4. Are the standards (norms) of achievement satisfactory? 

5. Does the test properly differentiate or differentiate with sat- 
isfactory fineness, or register a sufficiently wide range of abilities? 

6. Are the various editions (forms) of the test equivalent? 


There are other criteria of more or less practical importance, such 
as cost, coveniance to give or score, time required and interest de- 
veloped among those taking it. 


THE EXPERIMENT IN GENERAL. 


This investigation yas conducted during the past year at the 
Scarborough School, Scarborough, New York. It was part of a more 
extensive study of the constitution of reading ability, with special 
reference to reading disabilities. All told, about a dozen reading 
and vocabulary tests were used, along with a greater number of 
tests of more specific abilities used for purposes of diagnosis. The 
present report is limited to the material obtained from a group of 
representative reading and vocabulary tests. 

The Subjects: The pupils of Grades III to VIII, inclusive, in the 
Scarborough School served as subjects. Eaeh grade includes ap- 
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proximately 20 pupils. The records were complete for each pupil, 
since absentees were given the tests on reappearance at school. 


With very few exceptions, the pupils of the Scarborough school are 
above the median intelligence of the general population, according to { 
Terman’s norms. The median Stanford-Binet Intelligence Quotient 
for the grades considered is approximately 116.0. This restriction 
of the range of intelligence should be kept in mind in interpreting 
the correlations. 


The Tests Used: 1. Brown’s Silent Reading Test, Forms I and 
II. 

2. The Burgess Scale, P. 8S., No. 1, given twice. [ 

». Courtis’ Silent Reading Test, No. 2, Forms I and ITI. 

4. Monroe’s Silent Reading Test. 


5. Thorndike’s Scale for the Understanding of Sentences, Alpha 2. 


6. Thorndike-McCall Reading Test, Forms 1 and 2 in all grades; 
Forms 1, 2, 3, 4, 5 in grades IV and VI. 








7. Gray’s Oral Reading Test. 

8. Woodworth-Wells Directions Test. 
9. Holley’s Sentence Vocabulary Test. 
10. A Vocabulary Test arranged by the writer. | 
11. A Pronunciation Test arranged by the writer. 


12. Thorndike’s Visual Vocabulary Test. Used in all grades, 
but, due to an error in administration, results of but two grades | 
were reliable. 


The Composite Ratings for Speed and Comprehension. A Com- 
posite for Speed: The “rate” scores of the Courtis, Brown, Monroe 
and the Burgess, each weighed roughly as the square root of the 
time taken. 


B. Composite for Comprehension Number 1: The “Comprehen- 
sion” scores of the Brown, Courtis, Monroe, Thorndike-McCall, and 
the Directions test, each weighed roughly as the square root of the 
time taken. This composite was used before any information con- 
cerning the correlations with the individual measures were avail. 
able. When it turned out that the Brown measure of comprehen- 
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sion gave approximately a zero correlation with the composite, a 
new one was constructed. This is called the C. Corrected compo- 
site of comprehension: Same as Number I, except that the Brown 
score is omitted. 


Intelligence Tests. In addition to reading and vocabulary tests, 
the following measures of intelligence were secured : 


1. The Stanford Revision of the Binet Scale were given to Grades 
3, 4,5 and 6. 


2. A composite of the following group tests, each weighed roughly 
according to the time: Degyharn, Parts 1, 2, 3 (Grade III), Parts 4 
and 5, Grade IV and up; The National Intelligence Scale, Parts A 
and B, all grades; Otis Primary, Form A, Grades III and-IV, Ad- 
vanced Form A, Grade V and up; Meyer’s Mental Measure, all 
grades, Haggarty, Delta I, Grade III, Delta 2, Grade IV and up; 
Illinois, all grades; Holley Sentence Completion, all grades, and 
fTerman’s Group Test, Grades VII and VIII. 


_— iid 
The Coefficients of Correlation. Coefficients of correlation, each 


test with every other and with the composites were computed for 
each grade, the Pearson Product Moment formula being used 
throughout. Corrections for attenuation and the use of the tech- 
nique of partial correlations in certain instances would add some- 
what to the information secured, but the task of computing more 
than a thousand correlations presented in this paper was so great 
that further statistical analysis could not be attempted at this time. 
As measures of central tendencies and of variability the arithmetic 
mean and the standard deviation have been used. 

















In dealing with small grade groups, interpretation from coefii- 
cients of correlation must be made with very great care, partly be- 
cause correlations do not necessarily indicate cause and effect or 
identity of function, and partly because the degree of correlation is 
dependent upon the range of performance which the group displays. 
We cannot pretend to have secured the relations of, say, speed and 
comprehension that would obtain with ideal materials and groups, 
but corrections for attenuation and for the restriction of range 
would certainly make them larger than they appear in this paper. 
A technique for correction of attenuation is available, but laborious. 
No technique has been devised for correction of restriction in range 
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of performance. Grades are select groups, and our children are 
selected entirely from the upper 50%, mostly from the upper 25% 
of the population. We do not know what the S. D.’s in our tests 
from a random selection of the universe of children would be. We 
could not make exact corrections if we did. Some work with our 
material and in other studies shows a very high correlation between 
the r’s and the S. D.’s of the measures. Our best assumption is that 
if all S. D.’s (other things being equal) were as large as the largest, 
the correlations would be as large as the largest. For purposes of 
comparing one test with another, which is really our main concern 
in this study, the data are adequate. Where the S. D.’s are so 
exceptional as to considerably affect the r’s it will be noted. 

The order of presentation will be (1) a survey of the general re- 
sults with reference particularly to the validity of the concept of 
general reading ability, and (2) an intensive study of certain fea- 
tures of the several tests, treated singly for the purpose of discover- 
ing more exactly what the scales do test and how well they test it. 


The Concept of General Reading Ability. Table I gives the means 
and standard deviations of the correlations of Grades III to VIII, 
inclusive, for the several tests and the composite scores of compre- 
hension and rate. The facts are an ample justification of the con- 
cept of general reading ability, especially when one recalls that no 
corrections have been made for attenuation or for the decided re- 
strictions of the range of abilities. When it is realized that the 
criterion for comprehension in some grades represents as many as 
8 hours of reading, under test conditions, of a wide variety of mate- 
rials; connected stories; short, easy directions; short, hard direc- 
tions; longer paragraphs of directions, easy and hard; paragraphs 
for interpretation of varied difficulty on all kinds of content, prose 
and poetry, and these at different times, the fact that a five-minute 
test (Burgess) should show a mean correlation with it of .8 speaks 
well for the usefulness of the concept and the test. With the excep- 
tion of the Brown test, all measures of comprehension yield correla- 
tions of .7 or better. Any single test of rate correlates .6 or better, 
and the oral reading test or a vocabulary test is about as high. 

The correlations with the composite of Rate are strikingly similar, 
most tests measuring one about as well as the other; the slight dif- 
ferences being largely due to the fact that the “comprehension” 
scores and the “rate” scores are included in their respective com- 
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posites. The correlation between these composites (which are inde- 
pendent in content) averages .84+ 8. D..08. A coefficient of cor- 
relation does not enable us to say why this is so. It does not mean, 
necessarily, that there is no distinction between the two abilities. 
It shows merely that for some reason the two, in the mass, tend to 
co together. It will be found later on, as a result of detailed analy- 
sis of individual cases, that there is a real and useful distinction 
between “ability to comprehend” and “rate of reading,” and that in 
such critical cases different tests yield very different scores, and for 
that reason, where measurement is conducted for purposes of care- 
ful individual diagnosis, a test of both “speed” and “comprehension” 
is essential. 


The results do not justify the conclusion that we have, in reading, 
a group of functions bound by some general factor. The zero cor- 
relations yielded persistently by every grade in the case of Brown’s 
test of comprehension is evidence to the contrary; likewise, the 
law correlations with Stanford-Binet Mental Age and with other 
functions, e. g. spelling, not presented in this paper. Anticipating 
material to be presented later, it may be said that the correlation of 
reading with Mental Age becomes higher as we ascend the grades. 
This is not the case with the composite of Group Intelligence tests, 
which throughout yields a fairly high correlation with reading. It 
is imperative that the relation of reading to other abilities be dis- 
covered, and in our attempts we have found it specially instructive 
to treat the material by grade groups. Many important facts are 
lost in the massing of material as displayed in Table I. 


It is not possible to decide, from the data of Table I, what tests 
are to be preferred for purposes of measuring general reading ability 
and for a specific purpose (power of comprehension) what tests are 
most adequate. This most useful information can be secured only 
by studying the coefficients of reliability, the grade correlations with 
the composites, the intercorrelations among tests, and the use of 
many tests upon unusual types of readers. In the rough, several 
tests seem to measure the same thing; that is to say, they appear to, 
so far as can be discerned from coefficients of correlation, but when 
applied to the peculiar few, distinct differences in the measures 
appear. We are interested in tests for diagnostic purposes, and they 
become a means to that end only when we know exactly what they 
do measure. 
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THe BROWN SILENT READING TEsT. 


In a manual of 57 pages,’ Brown defends his choice of material 
and his methods of measuring speed and comprehension. The 
material is a rather interesting narrative of about 800 words, 
written in familiar English. There are three forms. The pupil 
reads silently for one minute, encircling the word last read. The 
words read per second are obtained by count, giving the conventional 
score for rate. The pupils are wanrned that they will be asked 
at the end of reading to write as much as they can remember. 
No time limit is set for the reproduction. For scoring the papers, 
keys are provided, indicating the essential idea in italics and less 
important matter in plain type. The papers are first read to secure 
a measure of the “quantity of reproduction,” which is stated as 
the percentage which the amount recalled is of the amount read. 
The papers are then examined again, “and only those ideas counted 
which are entirely correct in every respect and of which every detail 
is reproduced.” This is called quality of reproduction and is stated 
as a percentage. The final comprehension score is the mean of the 
two. The labor involving in the scoring is surprisingly great and 
the keys supplied by Brown are not without defects. 


On a priori grounds, objections could be raised to the method used 
by Brown, Courtis and others of measured speed by a test which 
provides no mechanical control of comprehension. In the Courtis 
test there is no certainty that children are maintaining a uniformity 
of care as regards comprehension ; in fact, there is no certainty that 
they are comprehending at all. There is the possibility that each 
child may adopt quite different degrees of care at different times. 
Brown probably secures more uniformity by virtue of the instruc- 
tion that the pupils will be asked to reproduce what they have read. 


The correlations between two rate tests with the Brown materials 
are: 


‘Brown, H. A. The Measurement of Ability to Read, Department of Public Instruc- 
tion, Bureau of Research, Bulletin No. 1, 1916, Concord, N. H 
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Coefficient Coefficient 
of Correlation of Reliability 

Grade Til.......... .76 86 
Or tie eusaae 61 76 
W <0Rte othe’ 67 80 
: eer ee AS 65 
ne ee ee DT 42 
We. £66.5 sron ie .60 Td 
ES SOnrs can enue 61 76 
MN oko GN is sabae 09 09. 


The coefficient of Reliability® in this case gives the degree, approx- 
imately, that the combined results of two trials of the test would 
correlate with the composite of two other trials. It gives a notion 
of how consistent the performance of children is. In this test the 
performances display but a moderately satisfactory degree of con- 
sistency, but a degree somewhat higher than that found for the 
Courtis test. These measures do not tell us anything about the 
validity of performance. It tells us only that, whether the child 
reads at a rate consistent with understanding, or whether he skims, 
or reads with greatest care, he does it with a certain degree of con- 
sistency. The validity of the test, i. e. whether it yields a measure 
of real reading ability, can be discovered by a study of Table II, 
which contains the correlations with other criteria. 


From this table it appears that the Brown Rate Score agrees 
about as well with other measures of rate as it does with itself. The 
correlations range downward from .67 with the Monroe Rate, 
through Directions and Burgess to .53 with Courtis. The correla- 
tion with the Composite of Rate is .82 + 8. D..10, which is higher 
than the Courtis, but not as high as the Monroe Rate. The Correla- 
tion of the Brown Rate with the corrected composite of comprehen- 
sion is .66 + 8S. D..10, which is, again, higher than Courtis, but not 
as high as Monroe Rate. The correlations with the vocabulary tests 
are a little better than .4, and with Gray’s Oral a little better than 
.». The Correlation with Stanford-Binet is very low, averaging 
jA7T+S8.D..12, but higher with the group tests of intelligence, 
40+ 8. D. .21. 





*See Brown, W. The Essentials of Mental Measurement. London. 1991. Pp. 101-2. 
Reliability, two trials, = 2r 
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The Brown comprehension score affords the single case of persist- 
ent zero correlation with the various measures used. Its correla- 
tion with the Brown rate is zero, as it is, approximately, with each 
and every measure of rate, comprehension, vocabulary and intelli- 
gence. Whatever this score does represent, it certainly is not com- 
prehensive in reading unless all other measures are invalid, which 
is scarcely likely. It may well be that a written reproduction of 
what was read during a minute is a useful exercise, and needs culti- 
vation, but it is not a measure of comprehension, at least when 
scored by Brown’s method. 


Brown suggests a composite score for general reading ability ob- 
tained by multiplying the score for rate by the score for compre- 
hension. Such a score has not been used in this study for obvious 
reasons. 


The presence of Brown’s score in the composite of comprehen- 
sion reduces its validity. It was eliminated, consequently, and the 
corrected criterion used throughout. The data for the uncorrected 
criterion are printed for whatever statistical interest they may 
possess. 


(To be continued.) 
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CONSTANCY OF THE STANFORD-BINET I. Q. 
AS SHOWN BY RETESTS. 


HAROLD RUGG AND CECILE COLLOTON 
The Lincoln School of Teachers’ College 


If the Stanford-Binet Intelligence Test is taken two or more times 
by the same pupils, how closely will the I. Q.’s agree? 

At least six reports are now available* from which the answer to 
this question can be formulated: (a) Terman (see Bibliography, 
No. 1); (b) Cuneo and Terman (2); (c) Garrison (3); (d) Poull 
(4); (e) Wallin (5) (d and e appear in this issue of the Journal of 
Educational Psychology; (f) Fermon (6); (g) Stenquist (7). 

This article will sumarize and interpret the evidence reported by 
these workers and add evidence secured in the educational psychology 
laboratory of the Lincoln School of Teachers College, 1920-1921. 
The data of the six investigations are summarized in Tables I and 
II. We have incorporated our own data in these tables on bases, so 
far as possible, which are comparable with those of other studies. 
The Binet testing in the Lincoln School was done as follows: Of the 
137 retests, Mr. Rugg gave 73 initial tests and Miss Anne Brown 64 
tests in the winter and spring of 1920. Miss Colloton gave 45 
initial tests in 1920-1921 and 121 retests. Mr. Rugg gave 16 retests. 


Our individual average differences are as follows: 
Number of Retests 


Mr. Rugg with himself ............ 5.5°* 16 
Miss Colloton with Mr. Rugg ...... 4.9** 59 
Miss Colloton with Miss Brown .... 4.5 62 


Constancy of the I. Q. can be expressed in three ways: (1) by the 
average difference between the initial and successive tests; (2) by 
the limits of the middle 50 per cent of the differences; (3) by the co- 
efficient of correlation between the successive tests. Table I pre- 
sents these facts for the seven studies. In these studies 1,487 re- 
tests are reported. All studies are recent—five, the work of the 
past year. 


*As shown by a search of the following magazines for the years 1915, 1916, 1917, 
1918, 1919, 1920, 1921: JoURNAL OF EDUCATIONAL PSYCHOLOGY ; Journal of Educational 
Research ; Journal Experimental Psychology ; School and Society ; Training School Bulle- 
tin; Psychological Clinic; Psychological Review, and Psychological Index. We will ap- 
preciate information from any reader who knows of other published or unpublished 
studies of Stanford-Binet Retests. 

**These average differences become 4.9 and 4.4 respectively if cases are omitted in 
which the pupils’ mental ability was not completedly explored at the initial test. This 
was caused by a rigorous following of directions. 
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TABLE 1 SUMMARY OF INVESTIGATIONS ON RETESTS WITH THER STANFORD-BINET 
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The present answer to the question concerning constancy of I. Q. 
The findings of Fermon and Stenquist are sharply distinguished 
from those of all other workers. Terman, Terman and Cuneo, Gar- 
rison, Poull, Wallin and the present writers report average differ- 
ences in I. Q. between first and second tests of approximately 5 
points I. Q. The investigations by Terman, Garrison and Poull, 
together with ours, represent 760 children. The average difference 
for these studies is closely 4.5 points I. Q. This means that the 
chances are approximately 20 to 1 that the I. Q. of a pupil reported 
from a single test (as measured in the Stanford-Binet with the care 
represented by these studies) is within 15 points of his true I. Q. 


Middle fifty per cent. For all studies the positive differences are 
nearly twice as large as the negative differences. Even so, the studies 
show that typical positive differences are less than 6 points. Typical 
negative differences are approximately 3 points. This means that 
the chances are one in two that an I.Q. from a single test will in- 
crease as much as 6 points or decrease as much as 3; that the 
chances are 1 in 5 that it will increase as much as 12, or decrease 
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decrease as much as 6; that the chances are 1 in 20 that it will in- 
crease as much as 18 ordecrease as much as 9. 


A significant fact therefore: much confidence can be put on a single 
[.Q. if the examination is made by experienced and well-trained ea- 
aminers who use rigorously the standardized procedure for giving 
the test. In a range of intelligence for large bodies of public school 
children of, say, 50 points (from 80 to 130 I.Q.) it is very helpful 
to be able to predict intelligence with as much precision as is im- 
plied by these figures. Furthermore, the giving of a retest in all 
doubtful cases will increase the stated degree of reliability by about 
40 per cent. That is, for two tests the P. E. becomes approximately 
» points. 


Thus, the recent studies, except those of Stenquist and Fermon, 
closely confirm Terman in his earlier statements. 


We have studied the details of the reports by Fermon and Sten- 
quist. The latter are careful to state that the examiners who did the 
testing were carefully trained and had tested at least 20 pupils un- 
der critical supervision. The comparison of their findings with 
those of the other studies throws great doubt on the validity of the 
examining which was done by these workers. We are convinced that 
the great differences in I. Q. must have been caused primarily by 
non-uniform scoring of responses by those who gave the tests. 
Stenquist says, however, “it seems certain that the differing I. Q.’s 
obtained from the successive tests cannot be accounted for by the 
personal equation of examiners. They are probably due, on the one 
hand, to actual differences in the child from time to time, and on 
the other hand to the fallibility of the crude instruments with which 
we are measuring a most complex thing.” (He is careful to state 
that his criticism is of the Binet scale and the I. Q. as absolute 
nveasures of intelligence.) 


Study the charts presented as Table II. These present a very in- 
teresting and important comparison of the detailed distribution of 
differences in I. Q. The extreme differences in retest are important 
as well as the central tendencies. It is significant that four differ- 
ent groups of investigators, working independently, obtain differ- 
ences in retest of more than 10 points in less than one-sixth of the 
cases. In our own work no difference was greater than +17 or 
— 15; 12% were more than 10. In Terman’s 67 out of 435, or 15%, 
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were greater than 10 points. Of our 137 retests 23 were greater 
than 8 Hight of these can be definitely explained by the fact that 
the first test did not completely explore the pupil’s mental ability. 
This raises an important point of technique,—that of not carrying 
the testing far enough to completely explore the pupil’s general 
mental ability. 

Only 6% of Garrison’s cases showed differences greater than 10. 


Average differences classified according to age of pupils. 

Table III classifies the differences by age levels. It shows that 
these differences are only slightly larger with very young children, 
especially below the entering school age of 6. It also shows that 
with no school children do the average differences exceed 7 points. 


(These conclusions ignore the data of Stenquist and Fermon, 
which, as indicated above, must be unsound.) For children of school 
age our data show that differences are not appreciably larger with 
the younger children, say, 6-9 years. In fact, the difference in dif- 
ference in retest may be neglected. This is contrary to the common 
view of the matter. 
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TABLE III. 
Comparison of Average Differences Between ist and 2nd Tests, Classified 
According to Age Levels. 














Investigator || 5 yrs’ 11 mo. [bitferencel| 8 yrs-11 mo. (pitterence||11 yre. 1 mo. [Ditferesce|| and over (Difference 
Terman 99 6.9 139 6.0 134 5.3 63 6.3 
Garrison 0 — 12 3.6 49 4.7 1 — 
“Stenquist 28 13.5 || 198 1.7 48 6.9 0 — 
Rugg—Colloton 0 — | 51 4.5 50 5.5 36 3.7 
| 
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TABLE IV. 
Comparison of Average Differences Between Ist and 2nd Tests, Classified 
According to Degree of Intelligence. 











; | 
Bright Average | Average Average Dull Average 
investigator Above 1101. Q. [Difference | 992—109 1.0. (Difference|| Below 901.0. (Difference 
Terman 183 58 || 147 6.2 104 58 
Garrison 26 5.6 31 4.0 5 9.2 
Stenquist 118 8.4 || 101 8.0 55 8.2 
| 
Rugg—Colloton 97 4.6 390CtéC‘(;#S:SAWT 1 ee 
| 
































Average differences classified according to degree of intelligence 
of the pupils. Table IV gives the data. The conclusion is the same 
as in the case of average differences classified on age levels: dif- 


ference in degree of intelligence seems not to be a factor. Differ-. 


ences in retest will be approximately the same, irrespective of, the 
intelligence of the pupils. | 

In Table V we present the details of our retests thrown together 
in one correlation table. We fixed a correlation of .84 between the 
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TABLE V. 
Agreement Between 1st and 2nd Test Rugg-Colloton 137 Cases Correlation 
r—.S4. 
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first and second test. Terman reports .93, Cuneo and Terman .95, 
94 and .85, respectively. 

Thus, the new investigations tend to confirm Terman in his 1917 
conclusions and to give us much confidence in the constancy of the 
I. Q. as measured by the Stanford Revision of the Binet-Simon 
Scale.* 


*Nevertheless there is much to be done in improving the scale and, probably, in 
making new individual scales. We will present definite criticisms of the scale at a 
later time, 
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CONSTANCY OF I. Q. IN MENTAL DEFECTIVES, ACCORDING 
TO THE STANFORD-REVISION OF BINET TESTS. 


LOUISE E. POULL. 
Psychologist, Children’s Hospital, Randall’s Island, New York City. 


The data of Table I were derived from retests of 126 inmates of 
Children’s Hospital, Randall's Island. The retests were made as part 
of the routine work of the institution and the cases are, therefore, un- 
selected, excepting that epileptics were excluded. The intervals be- 
tween the tests varied from six months to three years; the ages of 
the subjects from four years to 28 years; the I. Q.’s of the first test 
from 20 to 90. The records were made by trained psychologists, 
accustomed to the reactions of mental defectives. 


The data of the table show the plus or minus changes of the second 
test over the first in points of I.Q. It will be seen, on inspec- 
tion, that these subjects as a group did not deteriorate. 
The average change is an increase of + 1.28. The middle 50 per cent 
lie between —3.3 and + 4.8 variation. The Standard Deviation 
was found to be 5.83. 


It is significant that the curve does not differ from the one worked 
out by Terman from the records of unselected school children. (“The 
Intelligence of School Children,” p. 141 ff.) The indication is that 
mental defectives are not more variable than normal subjects. It 
is not assumed, however, that the question of the constancy of I. Q. 
is settled until further studies have been made covering regular 
intervals and including repeated tests of the same cases over a 
number of years. 


The injustice of disposing of persons under suspicion of mental 
defect on the basis of a single test is clearly demonstrated. A large 
percentage of the cases shows variations which operate to change 
the classification, and, in cases above the obvious imbecile type, only 
observation and re-testing can discover the individuals who require 
permanent supervision or institutional care. Composite ratings, 
including non-language and performance scales, together with the 
Stanford-Binet, have been found to give truer evaluations of problem 
cases, since they give weight to the manual abilities which some- 
times express a degree of intelligence hidden by language inhibitions. 
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MENTAL GROWTH AND THE IT. Q. 


LBPWIS M. TERMAN 
Stanford University. 


The problems relating to native mental abilities and to the de- 
velopmental changes which come with increasing maturity are, and 
perhaps always will be, among the central problems of educational 
psychology. We can not adapt the curriculum to the child without 
fairly accurate knowledge of the mental abilities which the child 
for the time being possesses and of the abilities which are required 
to master given types of curricula. We can not intelligently plan 
a child’s later education without more or less dependable means of 
forecasting what abilities he will possess at a given time in the 
future. Anything that adds to our knowledge of mental growth is 
bound to be of great practical significance for education. Hence 
the immense popularity of the Binet tests, which for the first time 
made possible a fairly serviceable determination of the stage of 
intellectual maturity which a given subject had attained. The 
value of these tests was even further enhanced when it was dis- 
covered that the intelligence quotient maintains, during the growth 
period at least, a certain amount of constancy. In proportion as 
laws of intellectual development obtain, the door of the future may 
be opened; determination of the child’s present intelligence status 
will enable us to forecast, within certain limits of error, what man- 
ner of adult he will become. 


That rough prediction is now possible on the basis of intelligence 
tests can no longer be denied. For example, it is a fairly safe pre- 
diction that the child who has been competently tested by the Binet 
scale and found to have an I. Q. of 75 will never attain an I. Q. of 
125, or that an I. Q. of 125 will never, barring definite nervous dis- 
ease, drop to 75. No one would now expect a child with an I. Q. of 
60 or 70 to be able to graduate from an average high school or pur- 
sue a college course. These predictions are of course very rough, 
but it is worth something to know that in general there is even a 
tendency for the superior to remain superior, for the average to re- 
main average, and for the inferior to remain inferior. The child- 
study literature of a decade or so ago gave wide currency to the view 


that the typical genius was as a child stupid, and that intellectual 
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precocity is likely to be followed by post-adolescent stupidity! To 
have progressed in ten years from this stage of ignorance to the 
point where we can predict with a probable error of 4.or 5 points 
what I. Q. a given child will have several years hence is a long step 
forward. 


There is no likelihood, however, that even this modest claim for 
the possibility of prediction will go unchallenged. In fact, insistent 
challenge has already come, chiefly from two sources. First, from 
teachers and others whose inclination is to believe in miracles and 
to look askance at so-called “laws of growth” on the basis of which 
we presume to forecast a child’s future. The acceptance of such 
laws is hindered by the deep-seated and blind faith that anything is 
possible for any child. To people who derive satisfaction from the 
fact that child nature contains so many unknown quantities, the 
suggestion that one’s final intelligence level may be predicted is 
actually repugnant. 

A challenge no less insistent has been voiced by a number of 
psychologists. It should go without saying that the questions raised 
by any psychologist who has seriously investigated the problem are 
entitled to a hearing. The issues are large enough to justify any 
amount of scientific caution. At the same time it is possible that 
over-zealous attack on a tentative hypothesis may be as inimical to 
true progress as its over-zealous and dogmatic support. I think it 
can be shown, for example, that some of the recent criticisms of the 
I. Q. are based on arguments and data so questionable that they are 
less likely to strengthen than to weaken the position they are in- 
tended to uphold. In the opinion of some, if the I. Q. can be demon- 
strated to have less than absolute constancy in a majority of cases, 
or to be markedly variable in selected individual cases, or to show a 
decrease with age in the case of feeble-minded subjects, or to be 
capable of misuse by the ignorant, it is ipso facto worthless and 
dangerous. 

It is not the purpose of the present article to defend the I. Q. 
Whatever merits or faults it may have as an index of an individu- 
al’s present or future intellectual status will sooner or later be de- 
termined by investigation. At present I wish only to examine some 
of the arguments and data relating to mental growth and the va- 
lidity of the I. Q. 
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Let us consider first Dr. Doll’s recent monograph on The Growth 
of Intelligence.* This study is based on repeated examinations, over 
a period of three to five years, of 203 feeble-minded subjects in the 
Vineland Training School. Of the entire number, 55 had been ex- 
amined at least once a year for five years and 72 at least every year 
but one for five years. In the case of 27, re-tests were continued 
only three years. Of the 203 subjects, 95 were below the age of 15 
years at the time of the initial test. Of these, 67 were followed for 
as much as three years prior to the life age of about 1514 years. 


The life ages of these 67 at the initial test were distributed as fol- 
lows: 


ff). eres es 6 7 8 4) 10 11 12 
DO 5.06 beak 1 e eee 8 13 11 


The age range of the 203 subjects was from 6 to 66 years, the range 
of the initial mental ages from 1 to 10.7, and that of the intelligence 
quotients from 7 to 88. It should be borne in mind in the following 
discussion that only the data from these 67 cases can be regarded as 
significant for mental growth and I. Q. validity. The value of the 
data from even this small group is greatly impaired by the small 
number of subjects at each age. The fact that nearly half of the 67 
cases were below 50 I. Q. at the time of the initial test means that 
the study can throw little if any light on the mental growth of 
normal or merely backward children. 


The scale used for the first two years was the Goddard translation 
of the 1908 Binet scale. For the remainder of the investigation the 
1911 Goddard Revision was used. The earlier records were trans- 
lated, in so far as it was possible to do so, into terms of the Goddard 
Revision. The tests were given “by a large number of different ex- 
aminers,” a part of them, it appears, by summer schoo] students in 
training. 

Not all of the tests were complete or of sufficiently wide range, 
but a workable objective method was devised for computing mental 
age scores in such cases. For each subject the mental growth curve 
was based upon smoothed data, not upon individual examinations. 
For example, all the mental ages for a subject in the first two years 
were averaged and the resulting value was taken as the mental age 
at the mid-point for this period. Then the mental ages for the second 
and third years were averaged and the result taken as the mental age 


*Psaychological Monographs, Vol. 29, No. 2, Whole No. 131, 1921, pp. 130. 
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for the second mid-point, and similarly for the third and fourth 
years, the fourth and fifth, etc. This method has its advantages, 
but it also has the effect of shortening considerably the final growth 
curve for each subject and of making it slightly flatter. 

For the purpose of establishing average mental growth curves 
the author classifies his subjects according to the final mental age 
attained and gives us the average curves separately for the groups 
with final mental age of 1 year, 2 years, 3 years, etc. The resulting 
curves are relatively flat, showing that for these feeble-minded sub- 
jects there is a marked tendency for the I. Q. derived from the God- 
dard Revision to decrease. For those whose final mental age is 4 
or 5, there is relatively little mental.growth after life age 11 and 
little after 12 for those whose final mental age is 6, 7, or 8. Those 
who reach the mental age of 9 or 10 continue to develop until 16, 
according to the data presented, although the author calls it 15. In 
view of the scantiness of the data for each of the groups, these find- 
ings, while extremely interesting, can not be taken as final. 

The author discusses at length the question of age at which mental 
growth normally ceases. As he admits that his morons show im- 
provement up to 15, it is surprising to find him contending that 
the normal adult level of intelligence is reached at about 13 years. 
If so, the feeble-minded develop later than the normal, which is not 
only contrary to the generally accepted view, but also to the fre- 
quently reiterated opinion of the author. In fact, the author’s treat- 
ment of this subject is rather confused and self-contradictory. After 
presenting his 1314-year hypothesis (p. 9 ff.) he alleges in support 
of it (p. 13) the argument that my assumption of the 16-year adult 
level is due to this being the “efficiency limit” of the Stanford Re- 
vision, which is of course irrelevant to the question. Then in order 
as given we find the following statements: 

P. 14. [The final arrest] “is probably no higher than 14.” 

P. 15. “There is reason to believe that the true age of average ar- 
rest of mental age growth is actually between 13 and 14.” 

P. 15. “This age [the age of final arrest for normal subjects] may 
be 15 or 16 or higher, but for reasons given may be provisionally 
placed at 13 years.” 
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P. 16. “Observation leads one to believe that idiots are arrested 
in their intellectual growth very early in life (say at about the life 
age of 5 or 6 years), that imbeciles are arrested at a somewhat later 
age (say about 10 or 12 years), and that morons are arrested still 
later (say about 15).” 


P. 59. “The ages for each [final] mental level at which all subjects 
are arrested are as follows: 


[Final] mental age level 1 2 3 4 5 7 
AMO GE GPTOME 2... cewecsss 10 8 611 12 15 15 12 15 14 


P. 68. “. . . The average rate of mental age increase of these 
feeble-minded subjects . . . reaches a practical minimum at 13 or 
14 years.” 

P. 76. [Morons] “are arrested about 15 years.” 

P. 84. Apropos of the age of growth cessation in the case of supe- 
rior children, “presumably the rate of growth would decrease after 
life age 13 years, since the final mental level of superior children is 
practically attained at that time.” 


P. 108 “Nearly all these subjects [feeble-minded] cease to de- 
velop several years before age 16, the theoretical limit to which they 
are expected to develop at a constant rate by the I. Q.” (Italics 
mine.) The last clause is of course entirely irrelevant, as the valid- 
ity of the I. Q. does not hinge upon growth ceasing at any particu- 
lar age. 

P. 118. “There is an age of arrest for every feeble-minded subject 
which almost invariably is reached before 15 vears of age... ..” 

Even the reader’s natural expectation of relief on coming to the 
author’s final summary is premature, for on two pages (127-128) 
we find the following three conclusions: “The more recent and exten- 
sive evidence suggests that the average adult level of intelligence is 
between 13 and 14.” “Significant mental age increases are limited 
to subjects under 15 years of life age.” “It [the annual rate of 
growth] reaches a minimum at about 13 years of life age.”* 

However, the author’s real adherence is to the 13-year hypothesis, 
to which he seems to have been led chiefly by the results of army 
mental testing. One may question whether he has not been too in- 


*It may interest the reader to know that the average of the above estimates is 13.83 
years, and the mean deviation .69 years. 
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clined to accept these results at their face value. The now famous 
mental age of 13.4 found for 653 unselected white enlisted men tested 
by the Stanford-Binet in August, 1918 (or for that matter other 
army test results), does not, for the following reasons, indicate the 
life age at which mental growth ceases: 


(1) The Standford-Binet may be somewhat too difficult in the up- 
per ranges. For all anyone knows, the mental age score 13.4 ought 
to be really 14.4. The error may be greater or less than this. 


(2) The 653 enlisted men for whom the average mental age 13.4 
was found cannot be taken as representative of the entire draft 
army. <A disproportionate number of them were from the southern 
and semi-southern states where, according to all the results of army 
mental testing, average intelligence is lower than in the northern 
and western states. 


(3) Just as this group may not have been representative of the 
entire draft, the draft army itself was certainly not representative 
of the male population between the ages of 21 and 31. Of 9,500,000 
registrants between these ages 6,973,000 were given exemption or 
deferred classification. For example, 67,000 agricultural “man- 
agers” and 61,000 agricultural “directors” and “comptrollers” were 
exempted. Those classified as “farmers” in the draft army were 
in the main farm laborers. This is only a sample of the kind of se 
lection that occurred all along the line. Probably only a small pro- 
portion of men in positions of even minor responsibility were 
drafted. The fraction of 1 per cent. exempted because of mental in- 
feriority was insignificant in comparison with the exemptions of 
skilled laborers, business men, and professional men. Furthermore, 
at the time the 653 men were tested (August, 1918) there were 
619,000 men in the military or naval service who had not been 
drafted, or between 15 and 20 per cent of the entire military and 
naval forces. There is reason to believe that these volunteers in- 
cluded a disproportionate number of college graduates, college stu- 
dents, recent graduates of high schools, high school students, and 
high-minded youths from the better classes generally. From such 
facts it is clear that the intellectual cream of the country between 
the ages 21 and 31 had been skimmed off several times before these 
653 drafted men arrived in camp. Nor should it be forgotten that 
all officers were excluded from this “enlisted men” group. 
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(4) The conditions under which tests were given in the army were 
in most cases far from ideal. The men had just reached camp. 
Doubtless many of them were bewildered or fatigued. Some were 
suffering from the effects of typhoid and smallpox vaccinations. 

(5) The scale used was an abbreviated Stanford-Binet, consisting 
of four tests in each age group. While this abbreviation yields 
scores which correlate very highly with scores from the entire scale, 
I think it can be shown that with adult subjects they tend on the 
average to run slightly lower. 


(6) A large proportion of the tests were given by examiners who 
had had little training in Binet procedure. My experience leads me 
to believe that partially trained examiners are more likely to score 
the tests too rigidly than too leniently.* 

My own 16-year estimate may be too high. As it was frankly ten- 
tative, I do not feel called upon to defend it. Fifteen years may be 
nearer the truth. Fourteen may be, but I doubt it. Anybody’s esti- 
mate at present is of course only guesswork. We may concede Dr. 
Doll a right to his own guess without admitting his claim to have 
overthrown the guesses of others. 


It will bear repeating that the author presents no data which 
throw any light on the age at which growth normally ceases, and 
that even his data for feeble-minded are, as far as this point is con- 
cerned, extremely scanty. Of his 203 subjects, only 95 were below 
15 at the time of the initial test; of these, only 67 were followed as 
much as three years prior to reaching age 1514; of the 67, 17 were 
not followed beyond age 13; and of the remaining 50, a large ma- 
jority were of idiot or imbecile grade. 


Unless these facts are borne in mind the reader will be continually 
misled in regard to the amount of growth which may be expected 
of the feeble-minded. For example, the statement (p. 47) that “only 
6 subjects, or 3 per cent , have gained as much as two years in four 
years of life” is seriously misleading. The 3 per cent figure is based 
on the entire 203 subjects, only 95 of whom were below 15 at initial 
examination. Of these, we could expect none to develop two years 
in four who were below 50 I. Q., even if the I. Q. remained constant. 

*For some of the above facts, especially those relating to draft statistics, I have 
drawn upon a memorandum which [I addressed to Major Yerkes on January 27, 1919, 
while the scientific report of the army mental testing was being prepared. The date 
for the memorandum were secured from official reports to which I do not now have 


access and cannot from memory locate, but I think there is no doubt about their essen- 
tial accuracy. 
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This throws out 42 more, leaving only 53. But of those above 50 
I. Q., we could expect none to add two years to the initial mental 
age who had not at least four years before reaching the age of 15. 
This throws out 14 additional cases, leaving only 39. However, we 
must further eliminate all subjects who had a mental age of 8 or 
more at the initial test, as the author admits that the extreme limit 
of efficiency of the Goddard Revision is 10 years. This throws out 
6 more, leaving 33. On this basis the 3 per cent. now becomes 18 per 
cent. Mental growth of the amount indicated is six times as likely 
to be encountered, as the author’s statement would suggest. 


We have seen that the author’s conclusions are not always in 
harmony with the raw data which he presents. It remains to point 
out that his original data are also misleading because of the intelli- 
gence scale on which they were based. The author raises this ques- 
tion, but while admitting that the Goddard Revision fails to dif- 
ferentiate above the mental age of 10 years, and that it has certain 
irregularities below this point, says that “none of these arguments 
seriously affects our results for the feeble-minded.” (P. 122.) Also 
(p. 125), “the worst we could expect from an admittedly imperfect 
scale would be to find irregularities in our average growth curves” ; 
and (p. 126) “from all these considerations we may conclude that 
the Goddard Scale is valid for our purposes, no matter from what 
angle it is viewed.” 


However, partly from previously published data of my own* and 
partly from a table of equivalents presented by the author (p. 124), 
I estimate that a subject whose I. Q. by the Stanford Revision re- 
mained steadily at 75 from age 8 to 15 years would have about the 
following Goddard I. Q.’s at the different life ages: 


eer 8 9 10 11 12 18 14 15 
ee eee err 7H 7 7% 7 7 TH TH TH 
ee Se re ree 85 84 81 78 76 73 T1 68 


I also estimate that after Goddard mental age 7 a year of growth as 
measured by the Goddard Revision equals about 1.2 years by the 
Stanford-Binet. In fact, the three years of Goddard mental age 
from 7 to 10 are equal to nearly four years by the Stanford-Binet, 
since at 7 the Goddard Revision is about .9 of a year easier and at 
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10 slightly harder than the Stanford-Binet.* The result, of course, 


is an exaggerated flattening of the growth curves approximately as 
indicated in Figure 1.*+ 
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Figure 1. Mental growth curves of the same individuals as shown by the Goddard 
and Stanford Revisions. (Note: To save space the first three years have been or ’*+2d.) 


It is therefore impossible to accept the author’s statement (p. 77) 
that “figure 10 [showing average growth curves] lends color to the 
theory that the upper grades of feeble-mindedness have approxi- 
mately an average normal rate of growth early in life.” That his 
average mental age curves appear to support this theory is largely 
due to the fact that those which begin at mental age 5 to 7 years 
are displaced nearly a year upward. The author departs still 
farther from his data in the statement (p. 77) that “the average 


*Terman and Knollin: Some Probelms Relating to the Detection of Borderline Cases 
of Mental Deficiency. J. of Psycho-Asthenics, Vol. 20, 1915, pp. 1-15. 


t*Thorndike (Psychological Clinic, 1914, 8, 185-189) shows that the Goddard mental age 
norms do not even fit the life ages of the children on whom they were based, being con- 
siderably too high in the lower range and considerably too low in the upper range. This 
is in fair agreement with my own data. 
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high-grade feeble-minded subject is ‘at age’ early in life, and is only 
potentially feeble-minded from the standpoint of mental age.” 
(Italics mine.) This is probably true of some feeble-minded sub- 
jects, but that it is the rule for any grade of mental deficiency 
is not indicated by any data known to me. The well-known facts 
regarding the late walking and late talking of a majority of feeble- 
minded children suggest that what the author takes to be the rule 
is decidedly the exception. As for the author’s data, only one of his 
203 cases was below the age of 7 years at the time of the initial 
test. 

The author devotes 42 of the 130 pages of his monograph to a 
“Critique of the I. Q.” While admitting that ‘the I. Q. is valuable 
as a measure of relative brightness” and that it is “superior to mere 
difference between age and mental age as a measure of retardation” 
(p. 89), he concludes that it is so lacking in constancy as to be mis- 
leading and worthless for purposes of forecasting later mental dev- 
elopment or “as a means of classification of such significant in- 
tellectual types as feeble-minded or gifted children.” He states 
that “only 1 subject out of a total of 106 feeble-minded subjects who 
were below 16 years of age at the first examination maintains an 
1. Q. which is in accord with the theory that the I. Q. is constant.” 
(P. 118.) My own view on the subject, he thinks, is not warranted 
by the facts. 

Notwithstanding his conclusions, the author presents no facts 
which contradict my own findings. On the contrary, his data for 
feeble-minded subjects agree much more closely with those I have 
found for normal children than I should have expected. For the 
author’s 95 cases who were below the age of 15 at the initial test I 
have computed the correlation between initial I. Q. and that found 
at the end of three years. I have chosen a three year period for the 
comparison because a fourth of his subjects below 15 were not re- 
tested for more than three years. The author does not give the 
I. Q’s. and I have not been able to compute them with perfect 
accuracy for the reason that the initial ages are given only in whole 
numbers as 7 years, 8 years, etc. I have therefore treated his 7 


_ year group as though all were 71, the 8 year group as 814; etc. 


The resulting error would presumably be in one direction as often 
as the other and would not affect the correlation except slightly to 
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lower it. Even so, the correlation, as shown in Table 1, is .963! 
While it is true that the I. Q’s. for these subjects tend to decrease, 
as is generally admitted to be the case with feeble-minded subjects, 
this does not interfere with the use of the I. Q. for purposes of 
prediction. For those who were re-tested for three years before 
reaching the age of 1514 years the central tendency is toward a drop 
in that time of 8 points. The upper quartile of changes is at —4.3 


and the lower quartile at —11.2. 
I Q at end of three years. 
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Table 1. Showing correlation between initial I. Q. and I. Q. at end of three 
years for 95 subjects under age 15. (r. .963). 











I have also computed the correlation between the initial I. Q. and 
the I. Q. at life age 15 (or with the last test when that was given 
prior to age 15). The coefficient is only a trifle lower than that 
shown in Table 1, namely .935. The correlation found by me for 


428 repeated tests of normal children was almost exactly the same, 
namely, .933. 


In further support of his contention that the I. Q. is of little 
value, the author cites the results of re-tests of borderline subjects 
by the N. Y. State Board of Charities.* From the original data of this 
report I have computed for the 49 subjects re-tested the agreement 
between first and second tests, which were separated in most cases 
by a year to a year and a half. The correlation, as shown in Table 


*N. Y. State Board of Charities: Second Report on 52 Borderline Cases in the 
Rome State Custodian Asylum. 1915, pp. 32. 
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2, is .905. Only one marked case of disagreement stands out in the 
line of relation, and that is for a 5-year-old subject whose first 
mental age was 3.4 and who a year later tested at 5.3. Purely 
chance errors of this extent of course sometimes occur, particularly 
in testing young subjects, with whom it is often difficult to establish 
suitable rapport. 
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Table 2. Showing correlation between repeated tests. Rome (N. Y.) children, 
(7. .905). 


Instead of the I. Q. having no value for prediction, as the author 
thinks he has demonstrated, the very data on which he bases this 
conclusion shows that even for feeble-minded subjects the I. Q. from 
three to five years hence can be predicted with a P. E. of less than 
4 points, or that the final mental age which a subject will attain can, 
by use of the I. Q., be predicted from three to five years in advance 
with a P. E. of 4 to 6 months. The author (p. 55) explains his 
omission of statistical treatment on the grounds that “statistical 
devices, such as expressions of central tendency, coefficients of 
variability, and co-efficients of correlation obscure rather than clar- 
ify the results.” 

The author’s main criticism of the I. Q. is based on the fact that 
the rate of intellectual growth, as indicated by two successive tests, 
is not predictable by its use. For example, if two children are tested 
and found to have widely differing I. Q.’s (one 70 and the other 
90, say), re-tests a year or two later may disclose a larger mental 
age increase for the 70 I. Q. subject than for the other. Therefore, 
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the predictive value of the I. Q. is nil. However, the author over- 
looks the important fact that an [. Q., or any other kind of intel- 
ligence score, has a considerable probable error. The argument 
assumes that the score is a perfectly accurate measure of the thing 
it purports to measure. As Ctis has shown,* the P. EF. of a Stauford- 
Binet score for a group of adult delinquent and “hobo” subjects of 
average mental age 13 or 14 is approximately 5144 months in terms 
of mental age, or about 3 points in terms of I. Q. With first grade 
school children of average mental age 6144 years another of my 
students has found it to be about 3 months. Since these subjects 
are young, the P. E. in terms of I. Q. is again about 3 points. With 
another miscellaneous group the P. E. of I. Q. was a little less than 
4 points. Accordingly, an I. Q. of 70 really means 703 o1 4.** 
Let us suppose that two ten-year-old children both of true I. Q. 100 
were tested, and let us suppose the I. Q.’s found were 97 and 103, 
which would mean mental age scores of 9.7 and 10.5 respectively. Let 
us suppose that a year later their true I. Q.’s are still 100, and their 
true mental ages are 11, but that re-test brings a reversal of the 
error, giving 103 and 97 I. Q. respectively. Their mental age 
scores would now be 11.33 and 10.67. The first subject would appear 
to have gained 1.67 years and the latter .33 of a year. Doll’s 
argument would assume that one had gained five times as rapidly 
as the other. The point is, of course, that no intelligence test yet 
devised can legitimately be used for measuring the rate of growth 
over relatively short periods, since the P. E.. of a mental age score 
is itself 25 to 50 per cent varying with age of the normal amount of 
growth for a year. If the author were to test a group of subjects 
on two successive days, as one of my students has done, he would 
find almost as large and as frequent I. Q. changes as his data show 
for tests separated by a year. If in such an experiment a subject 
were found to have gained a half year in mental age, surely Dr. Doll 
would attribute this to an over-night spurt in mental growth. 


It is the same fallacy which accounts for the author’s argument 
that the I. Q. is misleading because, being a function of the entire 
life age, it “irons out” significant mental growth changes by making 


*J. of Educational Research, March, 1921. The publication of this study, which was 
made in 1916, was delayed by the war. 


**The author’s mental ages, being averages of two or more tests, would have a some- 
what smaller P. B, than this, probably a little over 2 points. 
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them a fraction of a large unit.. On this ground the author argues 
that an I. Q. change of even 5 points is very significant, indicating 
as much as a halving or doubling of the mental growth rate in the 
interval between the tests. Thus far the author’s fallacious reason- 
ing may be accounted for by his neglect to take the probable error 
into account. Even apart from the probable error, however, the 
| argument is unsound. What it amounts to is this: the I. Q. is 
{ blamed because even a 50 per cent increase or 50 per cent decrease 
a Sh in growth rate does not in a short interval greatly alter its value. 
‘ Why should it? If this could occur the I. Q. would no longer be an 
index of brightness at all. A 50 per cent gain or loss in rate of 
mental growth for one year, even if such gain or loss were genuine 
and not a mere accident of score unreliability, would not greatly 
q alter a 10-year-old’s brightness status with reference to the norm 
i for his age. It is of course just this brightness status for which the 
I. Q. is intended to serve as an index. 
Bi The author takes exception to my statement that the fairly con- 
hii stant variability in the I. Q. distribution at different ages contradicts 
the traditional view that variability in mental traits increases 
toward adolescence. His criticism is based on the fact that there 
/ is greater age overlapping in intelligence as maturity is approached 
and that this is only an expression of the increasing variability. 
| ’ However, in common with Bobertag, Stern, Kuhlmann and others, 
I have myself pointed out this increase in age overlapping. My 
statement regarding variability was of course based on the fact that 
I consider mental age a misleading unit in which to express vari- 
| ability? Surely, the child of 12 years with a mental age of 11 does 
| not vary as much from normal as the 3-year-old who has a mental 
pyle age of 2. 
| The author also criticises my statement that “the mental age of 
| a subject is meaningless if considered apart from chronological age’”* : 
) in such a way as would lead the reader to assume that I consider 
mental age of little significance as compared with the I. Q. He 
proceeds, as though refuting my view, to show that it is mental age. 
rather than I. Q., which determines the school grade or the kind of 
vocational employment which is suitable to a given subject at a given 
time. In view of the fact that I have written two books largely to 
i show the value of mental age as a basis for school grading, and have 
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*The Measurement of Intelligence, 1916, p. 68. 
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many times pointed out that the I. Q. is useful as an index of bright- 
ness, but not (apart from age) of ability level, the author’s critic- 
ism hardly seems fair. 

Again (p. 16), “Terman maintains that mental growth develops 
at a constant rate for all degrees of brightness and dullness (except 
idiots and low-grade feeble-minded)”. Here the author refers to 
my book, The Intelligence of School Children. If he will read this 
again (Chapter 9) he will see that I simply state what is true for 
the data I offered, namely, re-tests of 315 children of whom only 31 
were below 80 I. Q. On p. 147 of the same book I expressly state 
that feeble-minded children testing below 60 may be less likely to 
hold their own than those of milder degrees of defect. On p. 150 
I warn the reader against accepting the I. Q. as infallibile and state 
that “in pathological subjects it may undergo large fluctuations.” 
On p. 154 I state that we could hardly expect the I. Q. to remain 
absolutely constant even if it were based upon a perfectly accurate 
scale, which I expressly pointed out we do not have. 

Other inaccuracies include the following: 

P. 6, “ It appears from Terman’s tables of standardization stat- 
istics that he did locate the single tests according to the general 
principles employed by Binet” (i. e. by the 75 per cent rule). Here 
the author refers by reference number to my monograph on the 
Stanford Revision.* However, in Table 43 of this monograph the 
per cents holdings for the Stanford Revision are explicity shown to 
decrease gradually from an average of 77 per cent for the tests of 
vear 4 to less than half this amount at the upper end of the scale. 
Elsewhere (p. 10) the author, contradicting his other statement, 
gives me credit for ignoring the 75 per cent rule, but asserts that 
the results are nevertheless the same. Still later (p. 124) he presents 
a table which shows the results are far from the same, a Goddard 
mental age at 5 or 6 years being shown to be nearly a year higher 
than my own and at 10 no higher. 

P. 7. “It [the Binet scale] is based on the arbitrary assumption 
that increments in mental age from year to year are equal in 
amount.” If “equal” here means anything it means equal in terms 
of some kind of absolute units. Of course we have no such units 
and are not likely to have soon. As a matter of fact the Binet type 
of scale does not necessarily presuppose equality of mental age steps. 





*Warwick and York, 1916. 
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P. 70. [There is] “a tendency for the lower mental ages or the 
more retarded subjects to show larger rates of increase than the 
higher mental ages or the less retarded subjects.” I have calculated, 
as well as I could from the author’s data, the relation between mental 
age increase and I. Q. for the 67 subjects who were followed for as 
much as three years before life age 15144 (or thereabouts). The 
results are as follows: 


re rer 0-29 30-54 55-88 
PE Oe Io bn cet ses cect net es T 26 33 
Approximate average I. Q. .......... 20) 4] 70 
Expected gain in 3 years............ 6 year 1.2 vear 2.1 vears 
Average gain found ................ year 8&year 9 vear 
Ratio, found to expected............ 1/3 2/3 3/7 


The important fact here is that those below 30 I. Q. tend to develop 
at 1/3 the expected rate, those from 30 to 54 I. Q. at 2/3 the expected 
rate, and those above 55 at 3/7 of the expected rate. The last fraction 
however, is too low, due to the fact that the mental age range from 
7 to 10 on the Goddard Revision represents more nearly four years 
of mental growth than three. 

P. 101. “The individual average annual rate of development for 
these subjects in most cases is less than 20 per cent. This in spite 
of the fact that most of the first I. Q.’s of these subjects [those under 
15 years at initial test] range above 50.” It appears, however, that 
nearly half of the first I. Q’s. were below 50, and that the average 
annual rate of development for those above 30 was for the middle 
group two-thirds of the expected and for the high group three- 
sevenths (perhaps actually four-sevenths) of the expected. 

Dr. Doll also gives the results of Miss Gillingham’s retests of 35 
superior children with I. Q.’s above 110. With regard to growth 
irregularity and I. Q. constancy the author’s conclusions from these 
tests are in line with those based upon tests of the feeble-minded and 
as little supported. However, he presents this hypothesis that the 
I. Q’s. of superior children tend to increase rather than decrease. 
‘We have found [for the 35 superior children of the ages 10, 11, and 
12] the average rate of growth to be distinctly higher than the average 
age I. Q., showing a tendency for the superiority to increase.” I find 
from his data that the central tendency of I. Q. change for these 
subjects was —2 points. I also find for these subjects a negative 
correlation of —.474 between I. Q. at first test and amount of im- 
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provement. That is, the brighter the superior child the less likely 
is the I. Q. to increase, which is contrary to the author’s hypothesis. 
Of the 27 below 135 I. Q. at first test, exactly two-thirds showed 
an increase; of the 8 above 134, only one-fourth. With a highly 
selected and relatively narrow range group such as we have here, a 
negative correlative between first and later tests would be a natural 
effect of the probable error of the scale. That is, the higher the 
initial 1. Q., the greater the chances that it is in error on the positive 
side; the lower the initial I. Q., the greater the chances that it is 
lower than it ought to be. However, the author’s data on superior 
children are too scanty to warrant any conclusions whatever, and 
in so far as they indicate anything it is the reverse of what his 
hypothesis lays down. 

In general, if one would know what the author’s data really show 
it is always necessary to determine this for one’s self from his tables 
of results. His own conclusions are so often either contrary to his 
facts or else irrelevant to them that verification is always necessary. 
One is tempted to offer the injunction caveat lector. 


(To be continued in October.) 
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This department has a two-fold function. It aims to serve research 
workers as well as educators, whose work brings them in close contact with 
children in the schools. It hopes to accomplish this service by suggesting 
research studies, which will meet well-defined school needs. 

In order that this service may be real and effective, the co-operation of 
research workers and school people is desired. Correspondence with reference 
to the following questions will be considered in selecting topics for future 


discussions. 
a. Which of the studies proposed would help you to solve a practical 
problem ? 


b. What topics might well be added to this list? Replies may be 
addresed to: Miss Laura Zirbes, 646 Park Ave., New York City. 


What is the maximum age and ability range of an effective pupil 
group? What range is most desirable in a class? Within what 
limits is difference in age or ability immaterial? Is the range the 
same for all school subjects, or different for content and tool subjects, 
for problem solution, construction projects and drill? What ages 
or grades combine best in working groups? How great is the dif- 
ference in efficiency among groups of different ranges? 

These are important questions for rural schools. There are some 
two hundred thousand one teacher schools in the United States. 
The total enrollment of each tends to be small, though all the element- 
ary grades may be represented. For economy of time, and for social 
ends as well, it is desirable to make fewer groups than the number 
of grades calls for. To what extent is such grouping desirable also 
from the standpoint of efficient instruction? 

FANNIE W. Dunn, Teachers College. 


The effect of textbooks on the outcomes of instruction. In another 
part of this issue reference is made to a study of the effect of text- 
books on methods of instruction. This would indeed be a fruitful 
topic for further investigation and research, if the making of school 
texts is to proceed scientifically and if the hope for improved in- 
struction is to be based on improved texts. 

The rigid adherence to school texts demanded of high school 


pupils in mathematics, history and literature, and in the sciences is 
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certainly not warranted by the psychological validity of the texts in 
question. A similarly rigid adherence to book-made plans in the 
elementary school is incompatible with the needs of particular classes 
and individuals and interferes with the spontaneity and flexibility 
which characterize teaching based on the actual experences, interests 
and purposes of the pupils in question. The teacher who notes the 
reactions of her pupils, and tactfully endeavors to adjust her work 
to the situation is not as rare as the one who combines with this 
type of leadership a clear vision of the aim and purpose of each 
lesson in relation to general educational outcomes, and a knowledge 
of the psychological conditions necessary to the realization of these 
ends. Research along these lines will be of more praetical value 


than assistance in the actual outlining of lesson plans for class-room 
use. 


At present, textbooks differ in organization and content, form 
and purpose, depending on the point of view and experience of their 
authors, and the demands of the publishers and the public. Whole 
sets of readers are recommended and adopted because of their ex- 
tensive study helps. Investigation shows that the study helps are 
not used. Books and lessons made to teach the mechanics of reading 
are used as texts in language or literature. Music is given place in 
the curriculum for its aesthetic value but is taught by the aid of 
books which stress formal and technical drills and even make songs 
subservient to these purposes. Subjects retained in the curriculum 
for supposed disciplinary values are taught by strict adherence to 
texts or methods which have never been subjected to experimental 
evaluation and whose outcomes are of doubtful educational value. 

Thus textbooks often prevent purposes, hamper learning and 
purpetuate obsolete educational ideals; or, by partial adjustments 
of content and method, seek to become sufficiently expurgated and 
imbued with modern ideas to pass muster. 


No doubt, the publication of texts is more remunerative than the 
investigation and evaluation of bodies of material and methods. of 
presentation as means for the attainment of worthwhile purposes. 
There is, nevertheless, great practical value in the impartial analyses 
of currant methods and results. Permanent contributions to educa- 
tional practice and literature can hardly be compiled without careful 
studies covering the social worth of their content, the psychological 
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implications of their organization, and the degree to which they 
serve adequate educational purposes. Only a small number of texts 
now in use are the result of scientific investigation and research. 
No doubt, these will maintain their standing and effectiveness, and 
raise the standard of other materials offered for publication. 


L. Z. 
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EDUCATIONAL TESTS. 


The Measurement of Language: What is Measured and its Significance. 
Ernest J. Ashbaugh, Journal of Educational Research, 1921, June, 32-39. 


Analysis of various scales for the measurement of language; their merits and 
their limitations. 


Scale of Attainment No. 2—An Examination for Measurement in History, 
Arithmetic, and English in the Eighth Grade. S. L. Pressey, Journal of Edu- 
cational Research, 1921, May, 359-369. Description of the examination: state- 
ment of norms for the examination as a whole and for each test: and discus- 
sion of the usefulness of the scale in comparing graduation standards. 


A Handwriting Scale for the Pupil. Frank Freeman, Elementary School 
Journal, 1921, June, 755-761. Description of a series of scales to enable the 
pupil to measure his own handwriting. Details of construction of scale. 


Graphical Representation of Grades of High-School Pupils. Elbert Allen 
School Review, 1921, June, 467-471. Description of a set of three cards for 
graphic representation of test grades, (1) card for class graph; (2) individual 
score card; (3) card for comparison of 4 specific types of graph. 


INTELLIGENCE TESTS. 


Fuctuation of Intelligence Quotient. S. C. Garrison, School and Society, 
1921, June, 647-649. Results of retests on 62 children in the Peabody Demon- 
stration School. 


The National Intelligence Tests. Guy M. Whipple. Journal of Educational 
Research, 1921, June, 16-31. Brief description of the history of the tests, the 


aims of the makers, the criteria that were observed, and the results that are 
being obtained. 


The High Cost of Testing. S. L. Pressey, Elementary School Journal, 
1921, June, 771-777. Discussion of three practical criteria to be considered by 
superintendents in selecting tests. 


The Intelligence Test and the Teacher. Otto W. Haisley, Elementary School 
Journal, 1921, May, 703-707. Results of an Intelligence Test given to all 
teachers in the school system at Niles, Michigan. 


Intelligence Tests in Classifying Children in the Elementary School. 
Charles Fordyce, Journal Educational Research, 1921, June, 41-43. Study 
of the results of the Haggerty Intelligence Examination in comparison with the 
school grades and estimates of teachers in the case of 1078 pupils in the Ele- 
mentary grades at Lincoln, Nebraska. 
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The Reliability of Test Scores. Truman L. Kelley, Journal of Educa- 
tional Reseach, 1921, May, 376-379. Critique of nine methods used in meas- 
uring reliability of tests with a recommendation for establishing a stan- 
dardized procedure. 


Suggestions Looking toward a cloeer contact with Practical Problems in 
Work with Educational Tests. S. L. Pressey, School and Society, 1921, June, 
710-716. A caution against the indiscriminate use of statistical procedure with- 
out reference to its application to the data and problems in hand. 


The Intelligence Examination for High School Freshmen. Ira J. Bright, 
Journal of Educational Research, 1921, June, 44-55. The Terman Group Test 
of Mental Ability in comparison with teacher’s marks in Latin, English, Al- 
gebra, and Handicraft Subjects. Using the test as a basis for organization 
of class groups. 


Standardizing Tests for Vocational Guidance. James Burt Miner, 1921, 
June, 629-633. School and Society. Making tests useful for vocational place- 
ment by (1) the measurement of occupational types and (2) the measurement 
of the most stable workers within an occupational group. 


Reclassification of Children on Basis of Tests in Port Chinton Schools. A. 
F. Meyers, Journal of Educational Method, 1921, Sept. 24-25. Classification 
of children on basis of intelligence and achievement tests, and establishment 
of mid-year promotions. 


The Freeman-Rugg General Intelligence Tests as an Aid to Economy in 
School Administration, Ray H. Bracewell, The School Review, 1921, June 
460-466. The classification of Freshmen pupils in ability groups on basis of 
Freeman-Rugg tests in the Burlington, Iowa High School. 


Psychological Clinics in the United States. Leta S. Hollingworth, T. C. 
Record, 1921, May, 221-225. History of the Psychological Clinic and its present 
status in the U. S. 


Some Results from a Testing Program in Idaho. 1. N. Madsen, School and 
Society, 1911, June, 668-671. Results for Idaho Schools in Haggerty In- 
telligence Examination, Monroe's Silent Reading Tests, and Monroe’s Rea- 
soning Tests in Arithmetic. 


Group Mental Testing in Altoona, Pa. Caroline E. Meyers, Garry C. 
Meyers, S. H. Layton, School and: Society, 1921, May, 624-6z8. Results of 
testing 6,774 children of the elementary schools of Altoona, Pa., with the Myers 
Mental Measure. 


A Program for Lowering the Percentage of Failures. Harlan C. Hines, 
School and Society, 1921, May, 582-584. The use of Terman’s Group Test of 
Mental Ability in Los Angeles public schools. 

Norms for the Sequin Form-board. Based on the averages for three 
trials. J. EB. Wallace Wallin. Journal of Delinquency, 1921, May, 381-386. 

Presenting Educational Measurements 80 as to Influence the Public Favor- 
ably. Carter Alexander, Journal of Educational Research, 1921, May, 345-358. 
Talking points on measurement for convincing taxpayer of educational needs 
and securing better school support. 

Minor Studies from the Psychological Laboratory of Indiana University. 
VIII. A Preliminary Investigation of Genera! Prognosis; i. e., General Intelli- 
gence.” Journal Applied Psychology, 1921, Mar., 78-84. Comparison of scores 
made by Junior High School class on a group intelligence tests and the marks 
made by the same children over a year later to determine prognostic efficiency 
of the tests. IX. Further data with regard to sex differences. Sex differences 
in mental and emotional traits as determined by various tests. 

Educational Guidance and Tests in College. Stephen S. Colvin, Journal 
Psychology, 1921, March, 46-56. Description of a series of tests for use in va- 
system of educational advice and direction for its students based on psycholog- 
ical tests. ° 
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A Test Series for Journalistic Aptitude. Max Freyd, Journal of Applied 
Psychology, 1921, March, 46,56. Description of a series of tests for use in vo- 
cational selection and guidance in the field of journalism. 


Tests in Industry. Morris 8. Viteles, Journal of Applied Psychology, 1921, 
Mar., 57-63. Need for tests for selection of workers in industry and difficulties 
uttending the development of such tests. 


The Problem of the Unselected Group in the Standardization of Tests. S. 
L. Pressey, Journal of Applied Psychology, 1921, Mar., 64-71. The problem 
of obtaining unselected, representative groups of cases for the determination 
of norms on the various types of test. 


LEARNING IN THE SCHOOL SUBJECTS. 


Analysis of Learning Processes and Specific Teaching. Charles H. Judd, 
Elementary School Journal, 1921, May, 655-664. Following up tests by analysis 
of results, study of particular cases, and specific teaching based on inten- 
sive analysis. 











SPECIAL REVIEW OF MRS. BURGESS’ MONOGRAPH ON 
SILENT READING. 


Burcess, May Ayers. The Measurement of Silent Reading. New 
York: Russell Sage Foundation. 1921. Pp. 163. 


One of the major virtues of Mrs. Burgess’ work is that it uses the 
analytical method and defines, with some degree of detail and exact- 
ness, what is being measured. It has become the custom in recent 
years for authors of tests to proclaim with complete complacency 
that they do not know what they are measuring. The criterion of 
validity which has been accepted by such makers of tests is a high 
correlation with some other test equally vague in its purposes and 
its object of attention. The result of this cumulative indefiniteness 
is that it is easy to get on the open market a great many hastily 
devised tests that have been handled and rehandled by formal statis- 
tical methods, but are altogether nondescript with regard to their 
value as instruments of educational diagnosis. 

The vagueness that has characterized tests has appeared also in 
the definition of school] subjects. What has not been included in the 
last few years under the term “reading”? All sorts and kinds of 
exercises which use printed words have been called tests in reading, 
whether they measure the mechanics of reading or the most abstract 
forms of reasoning. On the other hand, the importance of reading 
has often been overlooked in such tests as the so-called reasoning 
test in arithmatic. Authors of reasoning tests seem to assume that 
every child can get from the printed page the ideas involved in solv- 
ing an arithmetical problem, even when the statement of the prob- 
lem is intricate enough to baffle the reading power of an adult. 

Mrs. Burgess very properly challenges the work of the vague 
testers and sets an example which ought to give pause to the reck- 
less publication of half considered devices for measuring mental 
processes. She points out that every test should aim at some par- 
ticular point and not try at one stroke to do everything. She then 
tries out one after another of the devices which seem to her to fit 
her particular purposes and discards the tests which are not satis- 
factory, until she develops a perfected instrument. All this is done 
without hiding behind a smokescreen of higher mathematics that is 
intended to frighten off the critic and provide a shelter for vague- 
ness and lack of insight. 


348 








Mrs. Burgess’ Monograph on Silent Reading 349 


Such critical comments as can be added to the foregoing approval 
of what has thus been accomplished in Mrs. Burgess’ book ought 
perhaps to be postponed until the full impact of her discussion of 
tests has had time to be felt. At the risk of making a mistake and 
with the hope of promoting rather than in any way hindering the 
progress of analysis, I shall venture to point out some of the detailed 
difficulties which I find in the book. 

The list of factors controlling silent reading, given on pages 37 
and 38, evidently includes a mixture of many different kinds of 
items. Some are psychological, such as attention span; others are 
wholly objective, as uniforming of print. Some are very easy to 
keep constant; these are the objecti factors. Others are much 
less accessible to the experimenter, and it is by no means as certain 
that the test has succeeded in keeping them constant. Indeed, it is 
the belief of the present writer that attention span is always a 
variable. Whenever we work in the psychological laboratory we find 
that we must take into account fluctations of attention. It is 
never possible to get two individuals with the same span of atten- 
tion, however constant we make the external conditions. It would 
seem wise, therefore, in cataloguing the constants in a reading 
experiment to classify external and internal factors separately, and 
to interpret the final result with due regard to the impossibility of 


holding subjective factors constant in the same sense in which we 
can control objective factors. 


The foregoing discussion will make clear the reason why there is 
objection to the statement made by Mrs. Burgess on page 61. She 
writes, “The process by which the essential characteristic of con- 
stancy is obtained in educational measurements is the one used in 
physical measurements. It consists of distinguishing the possible 
controlling, varying factors; devising means for holding them all 
constant save one; and measuring that one. This is the law of 
the single variable.” 


The objection to this formula is that in matters psychological it 
is the organized whole which is important rather than any single 
factor. The history of reaction-time experiments is instructive in 
this case. It was assumed at one time that a simple reaction is a 
detachable part of a complex reaction. The subject of the experi- 
ment was asked to lift his hand at a given signal and the time of 
this simple reaction was measured. Later the same person was 
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asked to react under more complicated conditions. He was to 
lift his right hand if the signal was of one type, and his left if it 
was of another. The time required for this choice reaction, as it 
was called, was longer than the time required for the simple re- 
action. It was argued that the choice reaction was made up of 
the simple reaction, plus the factor of choice. The argument ran 
on this wise; in both complex and simple reactions the same eyes 


or ears receive the signals, the same hand responds, the difference is 
the one item of choice. 


Experimenters worked long and arduously on the assumption that 
their arguments were valid, but their results were full of curious 
inconsistencies and finally led them to see that a complex reaction- 
time is not a simple reaction plus an additional factor. A complex 
reaction is a new organized whole. It cannot be torn into parts 


as can a physical compound. It is what it is by virtue of its 
organization. 


This lesson from the history of psychology should be taken seri- 
ously as a guide to all who undertake psychological analyses. 
The principles of such analysis cannot be borrowed from physical 
science, and the factors sought are of a character different from 


the physical factors involved in an experiment in the natural science 
laboratory. 


The elaboration of this criticism is not intended to suggest scepti- 
cism as to the validity of Mrs. Burgess’ test. Her analysis has gone 
far enough to make her instrument of diagnosis very useful in a 
practical way and to render it much more definite in purpose than 
most tests. The refinement of her method and the final solution of 
the reading problem wait, however, for the further productive 
analysis of the reading situation. The plea which is made here is 
for more analysis, guided by experience collected through such 
studies as Mrs. Burgess has made. 


CHARLES H. Jupp. 





In this important monograph on silent reading, Mrs. May Agnes 
Burgess has rendered at least three services: (1) She has made a 
useful exposition of the laws of a scientific procedure in the con- 
struction of reading tests; (2) she has contributed a suggestive 
analysis of reading as a function, and (3) she has set up an admir- 
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able sample of the experimental and statistical study of an instru- 
ment that should precede, rather than follow, its. publication for 
general use. 


In the discussion of scientific methods as applied to the con- 
struction of reading tests, Mrs. Burgess advances, as the assump- 
tion of first importance the “Law of the Single Variable.” “It con- 
sists of distinguishing the possible controlling, varying factors; de- 
vising means for holding them all constant save one; and measur- 
ing that one. This is the law of the single variable.” P. 61. The 
spirit of this dictum is admirable, but the wording is too inflex- 
ible. Students of the history of scientific methods are familiar with 
this law as it is applied in the physical and mental sciences and 
will agree with Mrs. Burgess that absolute control of all variables 
save one is the ideal of scientific procedure. In the biological 
sciences it is not always possible to follow it rigidly. It is really 
not necessary and sometimes not possible, however desirable, to 
hold all variables constant. It is often only necessary or feasible 
to take all variables into account. In the physical sciences, as Mrs. 
Burgess points out, it is usually best and usually possible to control 
all variables save one, but in dealing with integrated human re- 
actions, it is frequently impossible. Our method is still scientific, 
however, if we can observe and measure the variables. In Gray’s 
oral reading test, for example, speed cannot be held constant when 
accuracy is being measured, but if speed is measured and allowed 
for, the demand of science is met. It is sometimes possible and 
proper to sidestep a variable, as, for example in the Thorndike test, 
by giving a maximum of time when the purpose is to measure 
power of comprehension freed of the mechanics of reading. Doubt- 
less it was its Mrs. Burgess’ intention chiefly to indicate the neces- 
sity of thus taking into account the several variables and she has, 
in fact, given an admirable sample of an effort to analyse silent read- 
ing into elemental factors. The necessity of controlling the vocab- 
ulary, phraseology, sentence structure, the motor reactions of carry- 
ing out the directions, the length of the paragraphs, the interest to 
the child, etc., and of eliminating arithmetical information, memory 
and other factors not necessarily involved in reading ability, is 
effectively considered. It is no reflection on the theoretical dis- 
cussion if all of the variables involved in the author’s own test are 
not perfectly controlled. The various paragraphs were standardized 
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for difficulty by selecting those whose directions were fulfilled by 
the same percentage of children. By the use of this single criterion 
it was possible for the time required to read and the time required 
to draw the supplements to vary from paragraph to paragraph. 
As a result of an experiment this was found to be the case. It 
should be added, however, that while these factors were not experi- 
mentally controlled an effort was made to equalize them empirically. 
If a test constructed with such care as the Burgess shows such 
defects, what of most others? 


The monograph calls our attention anew to a really neglected 
line of research; that of the interrelations of the several variables 
or “dimensions” of functions such as quality, accuracy, difficulty 
and speed. What is the effect upon quality of handwriting when 
speed is varied? How can we control the one while measuring the 
other and if we cannot, what allowances are to be made per unit 
change in either. In reading, writing, composition, etec., the vari- 
ables are related in still greater complication. 

A most commendable feature of the work with the Burgess 
Scale is the fact that statistical studies of its validity and 
reliability were conducted before the test was put on the market. 
We have more than the author’s opinion that the test does measure 
reading and a notion of how well it does it. In connection with the 
question of “reliability” or “consistency,” Mrs. Burgess has pre- 
sented an unusually clear account of the limitations of our conven- 
tional statistical methods and interpretations. The co-efficient of 
reliability, dependent upon the correlation of repeated tests with 
the same instrument “may constitute in some measure a basis for 
valid criticism of the test, but in the main (it appears) to reflect 
a real and inevitable variability of human performance. The im- 
portant fact to remember about such scores is that they may vary 
from day to day and still be actual true measures of ability on each 
occasion. Under such conditions the fact that the scores varv 
from trial to trial does not reflect any inaccuracy or inadequacy of 
the test,” p. 131. It does, however, indicate an inadequacy of the 
test results for practical purposes and clearly indicates that further 
study of the test is necessary. It is true that the co-efficient does 
not itself indicate the “cause” of variability. We may blame the 


*The data are to be presented in the October issue of this journal. 
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child, if we please, but we cannot change the child whereas we can 
change the test. It may be that the test should be lengthened, or 
that the units should be more carefully equalized or made more fine, 
or—well, only further experimental study will disclose what 
changes may be profitably made. At any rate, it is certain that a 
test to meet present-day requirements must yield highly consistent 
results. 

Perhaps the most fortunate outcome of Mrs. Burgess’ vigorous 
discussion will be its effect upon our attitude toward new tests. 
With this admirable example of scientific anlaysis and research be- 
fore us, it will be quite inexcusable for anyone t othrow on the 
market a ne wtest whose validity and reliability has not been very 
thoroughly determined. 


ARTHUR I. GATES. 





Whatever else may be said of the latest educational monograph 
issued by the Russell Sage Foundation, there is no gainsaying the 
fact that “The Measurement of Silent Reading” and the accompany. 
ing Scales merit the careful considerataion of serious thinkers and 
workers in the field of educational measurement. Dr. Burgess set 
for herself a task of no mean proportions and her book is an ex- 
ceptionally clear-cut and well-organized record of her procedure, and 
an interesting discussion of some of the principles and laws under- 
lving educational measurement. The raison d’etre of every step in 
the construction of the scale is given and the alternatives or methods 
used by other workers are critically examined and evaluated. The 
limitations of existing scales are set down at length. Several chap- 
ters are given to the discussion of the law of the single variable and 
its implications in the field of educational measurement. Dr. Bur- 
gess lists variables of quality, of difficulty, and of amount, and main- 
tains that the student of educational measurement must consider 
first which of these three he will attempt to measure, and then use 
the type of scale adapted to the measurement of that variable, ex- 
cluding from the the test or rigidly controlling other variables, so 
that comparisons based on measurement of one variable may not be 
adulterated by intruding factors. 

A scale like the Ayres Burgess Silent Reading Scale certainly 
facilitates comparison inasmuch as the units of the score are equal 
quantities of a single variable, measured under conditions in which 
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other factors are carefully controlled. This makes it an effective 
survey instrument or device for sampling the abilities of large groups. 

While it is thoroughly scientific to construct a test with due re- 
gard for the law of the single variable, the practical and scientific 
value of a test depends even more upon its reliability and validity 
than upon its simplicity and statistical qualifications. The ques- 
tion of validity is not discussed in the monograph. The test ma- 
terial is sufficiently unlike that used in most other school exercises 
to make this consideration significant. Furthermore, the very fact 
that the whole test is of approximately the same difficulty and re- 
quires but one type of reaction, leads one to wonder how much de- 
pendence may be placed on the scores in gauging the reading abili- 
ties of individuals. There are so many specialized reading abili- 
ties necessary to the proper performance of school duties and the 
satisfaction of the responsibilities and privileges of society. Diag- 
nostic analysis of reading deficiencies points the way to specific 
training to the end that pupils acquire habits and sets which func- 
tion in response to the varying demands of particular situations. 
General advice based on performance under one set of conditions 
is of doubtful value. The solution is a group of tests covering the 
specific reactions demanded in a representative group of reading 
situations, each one meeting the other scientific requirements so 
ably set forth by Dr. Burgess in her monograph. 


LAURA ZIRBES. 
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1. Three psychological studies of school children. The literature 
dealing with the superior child is growing constantly every year, 
and two of the studies to be reviewed here deal with that subject. 
These two studies, however, show a great contrast in method, inas- 
much as the one seems to revert to subjective methods for selecting 


superior children, while the other bases the selection upon objective 
mental tests. 


The short monograph by Badenes! is, therefore, very disappoint- 
ing to the psychologist. It lists nine pages of items or traits, mental, 
emotional, physical and so forth, supplies a chart on which the 
teacher is supposed to evaluate or describe these traits and that 
seems to be about all. The great value of mental tests seems to be 
disregarded, and the recent progress in the construction of rating 
scales, very pertinent to the author’s method, has been entirely 
ignored. The historical references in the introduction center around 
Stern and Meumann, while the valuable contributions of the United 
States in the work with superior children are scarcely mentioned. 


Superior children only enter incidentally as one of the groups of 
exceptional children studied by Gesell. The book reports chiefly 
the results of a rapid survey of 24,000 elementary school children in 
New Haven. The exceptional children were reported by the teach- 
ers and physical as well as mental defects were listed. There were 
370, or about 1.5 per cent reported mentally deficient. In contrast 
with this only 45 were reported mentally superior, and the author 
comments on this fact. It strengthens the reviewer’s conviction 
that mental tests are more needed for the selection of the superior 
than they are for the inferior. The great discrepancy between the 
numbers of superior and inferior children reported is, in the present 
instance, to some extent due to the fact that the New Haven 
survey emphasized the question of mental deficiency. All the 
children reported to the teachers as mentally deficient were 


1Badenes, J. E. The First Practical Steps in Selecting Gifted Children in a Large 
City School. New York. 1921. Pp. 22. 


2Gesell, A. Eaceptional Children and Public School Policy. Yale University Press. 
1921. Pp. 66. 
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given Doll’s short Binet and some other tests. Some of the chief 
results are given on figures 12 and 13, although there seem to be 
serious discrepancies as shown on the figures. Furthermore, the 
comparison of the distribution curves on figure 12 is vitiated by the 
difference in the totals of the two groups compared. The results of 
this survey are used to make specific and valuable recommendations 
for the local situation and a model program for the community care 
of mentally deficient school children is presented. 


The last of our monographs by Town’ is an intensive study of 52 
children during the first months of their school life. There are very 
few children in the world who have been so thoroughly measured, 
physically, anthropometrically, and mentally, as have the 52 children 
of this study. As many as 51 mental tests are shown on the psycho- 
logical profiles, and because of the great number of tests these pro- 
files become exceedingly difficult to read. Profiles based upon 
medium percentiles of groups of allied tests would have been more 
illuminating. Apart from the Binet and the Stanford Revision, 
only two tests, the Knox-Pintner Cubes and the Porteus Maze, were 
interpreted in the light of previous standardizations. In both 
cases these previous standardizations do not seem to suit the author 
and she remarks ambiguously that neither of the tests show much 
correlation with chronological age. This is a very peculiar state- 
ment to make after testing only 31 five-year olds, 12 six-year olds 
and one seven-year old, all of whom were children in the first year 
of school. An inverse correlation with chronological age is to be ex- 
pected with such a group, and indeed on the cube test we find that 
the median at age five is two years advanced, at age six exactly age 
six, and at age seven (one case) two years retarded. 

R. PIntTNer. 





2. Educational Psychology by a French Writer.—The eighth edi- 
tion of Claparéde’s work was issued in 1920, the first in 1905. An 
English translation of the fourth edition appeared in 1911. The 
present volume’ is very much enlarged since the fifth edition of 
1915; but, with very few exceptions, the bibliographical references 





8Town, C. A. Analytic Study of a Group of Five and Six-Year-Old Children. Studies 
in Child Welfare. Univ. of lowa. Vol. 1, No. 4. May, 1921, Pp. 87. 


1Claparede, Ed. Psychologie de Venfant et Pedagogie Experimentale. Geneva: Kun- 
dig, 1920. Pp. XL-571. 
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have not been brought up to date. The principal changes are the 
omission of the chapter on fatigue and the very great extension of 
two other chapters, so that the work in its present form is more 
than twice the size it was ten years ago. 

Let no one be mislead by the title into expecting a treatise on 
child nature. It is rather an argument for the value of experimental 
education, telling why we should study children, how to do so, and 
who has done so, with a minimum of results found except as they 
illustrate points in method. At the beginning is given an excellent 
outline of the history of child study, with a summary of the work 
done in different countries to date. Then follows an exhaustive 
analysis of the problems with which the science has to deal, and the 
methods by which it attacks them. The last chapter, on mental de- 
velopment, takes about one-quarter of the book only, and fully one- 
third of that is devoted to the topic of play. After discussing the 
various theories of play the author announces his own view of its 
function: that of allowing to the individual a realization of the 
self by following a line of greatest interest when more serious ac- 
tivities do not afford scope for that for the time being. Classifica- 
tions of play follow, and a brief description of the dominant interests 
at different ages, with a plea for their greater utilization in school 
tasks. Evidently in sympathy with Dewey’s writings he would 
surely advocate the introduction of the project method to the schools 
of his country. M. T. WHITLEY. 





3. A Helpful Summary and Interpretation of Experimental Evi- 
dence on Silent Reading.—O’Brien’s recent book, Silent Reading, is 
undoubtedly the most helpful single summary and application of con- 
temporary psychology in the field of school reading that has been 
published. It does two things very well indeed. First, it discusses, 
in language that the school administrator and teacher can under- 
stand, the previous scientific investigations which have dealt with 
movements of the eyes during the reading process; with compari- 
sons of oral and silent reading; with the factors which contribute 
to rapid silent reading. (He finds the most important to be, (1) 
practice in rapid silent reading; (2) decrease of vocalization; (3) 
training in perception; (4) character of subject-matter; (5) habits 
of eye movement; (6) purpose for which subject-matter is read; (7) 





20’Brien, J. A. Silent Reading. New York: MacMillan, 1921. Pp. 
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concentration of attention; (8) ability to grasp the meaning of 
contents, etc.) with the difficult and necessary reduction of vocali- 
zation ; and with training in increasing perception. I know no bet- 
ter statement of the present investigational status of these matters 
than Dr. O’Brien gives. 

On the side of positive and constructive training of children 
O’Brien reports his very extensive experimental investigation of 
methods of obtaining rapid and effective silent reading in 40 school 
classes. (Grades III-VIII.) As a result of this investigation and 
by interpreting contemporary thinking as it concerns the teaching 
of reading, he presents definite recommendations for the develop- 
ment of rapidity in rate of reading, for reducing vocalization and 
for increasing perceptual ability. H. O. R. 


4. A Book on Mental Tests by An English Writer —‘Why is it 
that America has been moving so rapidly in the matter of mental 
tests while England has almost stood still? The answer is simple: 
Speaking generally, Americans believe in psychology, but English- 
men do not. When America entered the war, one of the first things 
she did was to mobilize her psychologists. The war was nearly over 
before England discovered that psychologists were of any use.” 

This quotation is from a recent English book,’ in which the author 
undertakes a task which would be unnecessary today in America, 
namely, to persuade the teacher to “believe in psychology.” 

in the first chapter he replies to those who instinctively dislike 
the idea of bringing measurement into education, and gives a brief 
historical account of the development of Mental Tests. This is fol- 
lowed by a discussion of the subject of general intelligence, and of 
the work of Binet. Though the Terman revision is accessible in 
England and is commended by Mr. Ballard, he quotes in his text 
the translation of Binet’s tests which was made by Mr. Cyril Burt 
in consultation with Binet’s collaborator, Dr. Simon. Mr. Burt re- 
arranged the tests in order of difficulty, and made the age-assign- 
ments in accordance with the results of his experiments with a 
large number of children in the London Elementary Schools. Un- 
fortunately, the exact number of children is not stated. | 

It is interesting to compare the different ages to which Burt and 
Terman assign the same tests. In several of the lowest tests, the 





1Ballard, P. B. Mental Tests. London: Hodder & Stoughton. 1920. Pp. IX+ 235. 
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American child is in advance of the English, while later he seems 
to be out-stripped; e. g., in the repetition of numbers the age stand- 
ards are as follows: 


Repetition of 3 numbers Age 3 Age 4 
Repetition of 4 numbers Age 4 Age 5 
Repetition of 5 numbers Age 7 Age 6 
Repetition of 6 numbers Age 10 Age 9 
Repetition of 7 numbers Age 14 Age 11 


In giving this test Burt read the numbers at the rate recommended 
by Binet, i. e., two per second, while Terman’s directions give “a 
slightly faster rate than one per second.” 

The American is evidently quicker in verbal repetition all through 
his childhood, as at the age of three he can repeat six syllables, an 
accomplishment of the English four-year old; at five he can manage 
13-15, while the English child can repeat only 10; at six, he repeats 
16, the English seven-year standard. 

It is at the higher age levels that the English boy or girl out- 
strips the American, e. g., the fifteen-year old does one of Terman’s 
“Average Adult” tests and two at the “Superior Adult” level. The 
only test for English fifteen-year olds that Americans can do earlier 
is the question, “What are the three chief differences between a King 
and a President ?”’,—a test obviously more suited to Americans. 


Both Mr. Ballard and Mr. Burt feel the inadequacy of Binet’s 
tests, and also of Terman’s additions, for the discovery of the bright- 
est children. For this purpose Mr. Burt has drawn up a series of 50 
Reasoning Tests for ages 7 to 14, (re-printed in this book). They 
are individual oral tests, and the score can easily be changed to a 
Mental Age, and a Reasoning Quotient obtained. They are in- 
genious little puzzles, involving the application of thought to the 
ordinary affairs of life, and almost as applicable to America as to 
Ingland,—in fact, they have recently been tried with success in this 
country. 

The second part of the book is concerned with Educational Tests, 
prefaced by a clear and useful account of statistical distribution 
and dispersion. The tests and age-standards given here, in read- 
ing, spelling and arithmetic, show that England is beginning to 
strike out for herself along the path opened up by Thorndike, Courtis 





~ Sa 4 a +" »% 
Rae de -s 
heat - 


er ee ee 
gr eP er % et 


~ eg EBA > Oo ee) 
pte On, SS aie eit 


~ “Ti 4r™ 
—— ee 


Reed Satan 2 ee 


vie 
CRD ey 
EE ST SIG ge Te 


ta RS 


ST a ae ae 


CS a eee ee 
2 tea 


< - 4 


: i + es 


eae 


Se 


EX Fe 





360 The Journal of Educational Psychology 


and others. Ballard acknowledges his debt to the American school, 
and appeals again and again for age. standards instead of grade 
standards, the latter being unintelligible to English readers. 

The book will undoubtedly serve its purpose,—to popularize the 
subject with English teachers, and to disprove the allegation of 
the British press that mental tests are nothing but “new American 
fads.” E. I. Newcome, 

Institute of Educational Research Teachers’ College. 





5. Physical Growth of Children.—This monograph’ offers a com- 
plete survey of investigations on the physical status and growth of 
children, together with a report of the technique and results of the 
extensive work of the Iowa Child Welfare Research Station under 
the direction of Bird T. Baldwin. 


Part I, dealing with the anthropometric instruments and methods, 
will serve as an excellent manual for class work. Part II, gives 
the data on the weight and height of infants, including table of 
norms, correlations and 400 individual growth curves. Part III, 
deals with anatomical and physiological age and growth in which 
among other measures Roentgengrams are used as criteria. Part 
IV is a historical survey of 911 investigations in the field, and Part 
V includes tabular summaries of all available data, comprising 
nearly five and a half million cases. Part VI is an annotated bib- 
liography of the 911 articles and Part VII gives tables of English 
equivalents for the French metric system. 

The monograph has no peer as a manual for advanced students 
and for teachers in this field. It is exhaustive in content and the 
technical manipulation of data is excellent. It contains important 
original contributions, especially in the form of continuous growth 
curves obtained by re-tests of the same children and the correlations 
of anatomical and with physiological, mental and emotional capaci- 
ties. A. I. G. 


6. An Elementary History of Education for Normal Schools.’— 
This book by Dr. Finney is one of the new Modern Teachers Series, 
edited by Dr. W. C. Bagley. It is designed to give the prospective 
teacher some idea of the structure and purpose of American public 
education. 





” 1Balawin, Bird T. The Physical Growth of Children from Birth to Maturity. Uni- 
versity of Iowa Studies in Child Welfare. 1921. Vol. 1, No. 1. Pp. 411. 
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The book deals in an elementary way with 1. the Colonial period, 
1667-1776; 2. Period of Nationalization, 1853-1861; 3. The Great 
Educational Awakening, 1853-1861; 4. The Transition Period, 
1861-1890; and the Recent Period, 1890-1920. European influences 
are discussed for the most part by the inclusion of chapters on Rous- 
seau, Pestalozzi, Herbart and Froebel. On the other hand, such 
topics as the English Poor Laws and the Apprenticeship system 
which had an important bearing on American education receive too 
little space. The recent period discusses (a) educational reorgani- 
zation (b) enrichment of the curriculum and (c) educational theory 
and practice. 


The book tends to be encyclopedic in that it discusses many im- 
portant movements in a paragraph. References anc suggestive 
problems at the end of the chapters would have been of distinct help 
to the normal school student, for which it is designed. However, 
as a brief survey, made concrete by the inclusion of numerous pic- 
tures, charts and diagrams, the American Public School should 
help to make beginning teachers intelligent as to the historical de- 
velopment of public education and should serve to stimulate them 
to work for the improvement of the American school system. 

EK. U. Ruae. 





7. A Book Describing An Elementary School Curriculum Based 
on Year-Long Series of Projects.—That school people run to educa- 
tional “movements” has been exemplified by the recent fervor over 
organizing teaching methods on the basis of so-called “projects.” 
The chief protagonist of the movement definitely restricts the use of 
the term to method. His followers, however, are now engaged in 
rebuilding the curriculum on the basis of it. Probably the most ex- 
treme instance of such an application in curriculum making is Miss 
Wells’ A Project Curriculum.’ Dr. Wells has attempted to construct 


a curriculum in which all instruction of each of the first six grades 
is organized in one continuous year-long series of activities or “pro- 
jects.” 

Children were actually taught by this method under her super- 
vision in a school in Trenton, N. J., in the first three grades. Her 
course for the fourth, fifth and sixth grades is still theoretical and 





2Finney, R. L. The American Public School. New York: MacMillan. 1921. Pp. 
XIV +324. 

‘Wells, Margaret E. A Project Curriculum. J. B. Lippincott Co., Philadelphia. 1921. 
Pp. XIII +338. 
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is sketched only in outline in her book. The first grade “project” 
is Playing Families; the second grade, Playing Store; the third 
grade, Playing City. 

This pedagogical innovation is based upon the notion that each 
phase of school work shall be so far as posible a replica of a life 
activity. According to this theory, since people live in families the 
children shall be actually organized as families and the entire work 
of a school year shall be carried on as a series of things which an in- 
dividual in a group or a family would do. Thus, children are as- 
signed different roles in the family; the work of the year is thor- 
oughly dramatized; “doll families” are made and dressed, thus 
providing the “motivation” (the “stimulating environment” of the 
free-educationists) for even the arithmetic! And most of these cur- 
riculum reformers would go so far as to say that such situations— 
such “life activities’—must be found through which all the skills, 
all the necessary information and all training in problem solving, 
and development of fundamental attitudes is to be developed. The 
protagonists of such a method do not compromise with those who 
demand definite and economical practice on socially worth-while 
skills. They say, with Miss Wells, that you get the skill without 
specific repetition, under the “intense motivation” of “life situa- 
tions”! 

We have insufficient review space for a detailed critical analysis 
of this book or this theory. An article or a monograph should be 
written upon it, showing of how much of current psychology these 
believers fail to make use. Suffice it to say here that Miss Wells 
reports in detail the theses and principles upon which her curricu- 
lum is based. She attempts to show outcomes but succeeds very im- 
perfectly. She merely lists the facts, skills, habits, attitudes, and 
ideals which must have been employed by the children. How much 
skill? How well are the facts learned? What problem solving abili- 
ties have been developed? The answers to these questions neither 
we nor Miss Wells know for, astonishing as it may seem, she did not 
measure to find out! 

This book may be regarded only as an interesting suggestion ap- 
plying the motives of “free” education, of a certain form of “project 
method” to the making of the curriculum—an application which it 
is very doubtful indeed if the leaders themselves in these movements 
will accept. H. O. R. 
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..9. Empirical Studies in School Reading, —Under this title one of 
the*Teachers College Contributions to Education” undertakes, among 
other things, to analyze and classify the study helps in four sets of 
Literary Readers used in grades IV to VIII. No attempt is made 
to report or analyze the literary content of these readers in the light 
of the valuable criteria assembled and made available in the opening 
pages of the study. The point of view is that of method. Consider- 
able value attaches to the assembled authoritative quotations on the 
nature and purpose of literature and the aims various methods of 
studying it. The compilers of literary readers are not listed among 
those whose writings were canvassed. While prefatory statements 
in their compilations are in general agreement with the aims and 
purposes advocated by the literary authorities previously mentioned, 
the study helps, questions and directions to pupils are decidedly 
formal in nature. Those who advocate in their texts so much 
language training and other formal work have, no doubt, considered 
this training esential and have thought to improve such training by 
using material of high literary value. That literary appreciation 
is another matter, hardly attainable as a by-product of such in- 
struction is clearly demonstrated by the verbatim reports of ex- 
perimental lessons and the experimental evaluation of methods 
and devices. 

While the methods of study outlined in the books analyzed did 
not vary sustantially from the methods used by a random sampling 
of Chicago teachers, none of the teachers used the helps suggested 
in the readers nor did the pupils refer to them. 

One hundred and thirty-one persons of long experience in school- 
work were asked to rank eighteen questions in the order of their 
merit as aids in the study of literature. These rankings emphasize 
the value of questions which draw on the pupil’s related experiences 
and assist him to realize in imagination the experiences of the poet. 
Matters of fact and literary technique were ranked low. While 
these judgments contradict teaching and text book practice, they 
harmonize with the preponderance of expert opinion of writers on 
literature and literary study. 

Of the two methods set up and used experimentally, and tested 
under controlled conditions, the one which emphasized the sym- 





1Empirical Studies in School Reading: James Fleming Hosic, Ph. D., Teachers College, 
Columbia University Contributions to Education, No. 114. New York City. 1921. Pp. 
VIII—174. 
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pathetic approach and gave opportunity for picturing and imaging 
the experiences in the selection was found superior to the one which 
stressed technique, analysis and factual detail. 

The conclusions reached in the study merit careful consideration 
by those who are interested in the improvement of textbooks, and by 
those who realize the psychological significance of the aesthetic 
experiences to which the study of literature may open the way. 

L. Z. 





Ill. ADDITIONAL PUBLICATIONS RECEIVED.* 


A. MENTAL AND EDUCATIONAL TESTs. 
SANTA ANNA (CAL.) PUBLIC SCDOOLS, Depr. oF RESEARCH. Four Years of 
Standard Tests and Measurements. By Mary B. Henry, 1921, Paper. 
Pp. 27. 


B. PUBLICATIONS IN THE GENERAL EDUCATIONAL FYIELD. 
BERKSON, I. B. Theories of Americanization. Teachers College Contributions 
to Education; No. 109. New York: Teachers College, Columbia Univer- 
sity, 1921. Pp. VIII + 226. 


HALL, G. STANLEY AND HIS STUDENTS. Aspects of Child Life and Education. 
New York: Appleton & Co. 1921. Pp. XV +826. (Reprint.) 


Hos1ao, J. F. Sample Projects. 506 W. 69th St., Chicago. 1920. Paper. 
Pr. 32. 


Powers, S. R. A History of Teaching of Chemistry in the Secondary Schools 
of the United States Previous to 1850. University of Minnesota, Minne- 
apolis, Minn. 1920. Paper. Pp. 69. 50 cents. 


Reavis, G. H. Factors Controlling Attendance in Rural Schools. Teachers 
College Contributions to Education No. 108. New York: Teachers Col- 
lege, Columbia University. 1921. Pp. 69. 


THORNDIKE, E. L. The New Methods in Arithmetic. Chicago and New York: 
Rand MeNally Co., 1921. Pp. VIII + 260. To be reviewed in the October 
issue. 


C. New ScHoon TEXTBOOKS. 


CoRYELL, H. V. AND HoLMes, H. W. Word Finder. Yonkers (N. Y.) World 
Book Co., 1921. Pp. VIII + 150. 

DuNN, ARTHUR W. Community Civics and Rural Life. Boston; D. C. Heath 
& Co. 1920. Pp. XII + 507. 


Fincu, CHAs. E. Everyday Civics. New York: American Book Co. Pp. 
X + 326. 


*Publications which are reviewed in this issue are not listed here. 
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